专利摘要:
abstract "systems and methods for detecting rare mutations and copy number variation" The present invention provides a system and method for detecting rare mutations and copy number variations of cell-free polynucleotides. generally, the systems and methods comprise sample preparation, or the extraction and isolation of free cells from polynucleotide sequences from a body fluid; subsequent sequencing of cell-free polynucleotides by techniques known in the art; and application of bioinformatics tools to detect rare mutations and copy number variations compared to a reference. The systems and methods may also contain a database or collection of different rare mutations or copy number variation profiles from different diseases, to be used as additional references aiding in the detection of rare mutations, copy number variation profiles or general genetic profiles of a disease.
公开号:BR112015004847A2
申请号:R112015004847
申请日:2013-09-04
公开日:2020-04-22
发明作者:Talasaz Amirali;Eltoukhy Helmy
申请人:Guardant Health Inc;
IPC主号:
专利说明:

METHODS TO DETECT NUMBER OF COPIES VARIATION, TO DETECT A RARE MUTATION IN A SAMPLE AND TO CHARACTERIZE THE HETEROGENEITY OF AN ABNORMAL AFFECTION IN AN INDIVIDUAL
CROSS-REFERENCE [001] This application claims priority from U.S. Provisional Patent Application 61 / 696,734, filed September 4, 2012, Provisional Patent Application n-U.S.
61 / 704,400, filed September 21, 2012, Provisional Patent Application No. U.S. 61 / 793,997, filed March 15, 2013 and Provisional Patent Application n-U.S.
61 / 845,987, filed on July 13, 2013, each of which is fully incorporated into this document as a reference for all purposes.
BACKGROUND OF THE INVENTION [002] The detection and quantification of polynucleotides are important for molecular biology and medical applications such as diagnosis. Genetic testing is particularly useful for various diagnostic methods. For example, disorders that are caused by rare genetic changes (for example, sequence variants) or changes in epigenetic markers, such as cancer and partial or complete aneuploidy, can be detected or more precisely characterized with DNA sequence information.
[003] Early detection and monitoring of genetic diseases, such as cancer, are often useful and necessary in the successful treatment or management of the disease. One approach may include monitoring a sample derived from cell-free nucleic acids,
Petition 870160049132, of 9/5/2016, p. 4/177
2/159 a population of polynucleotides that can be found in different types of body fluids. In some cases, the disease can be characterized or detected based on the detection of genetic aberrations, such as a change in the copy number variation and / or sequence variation of one or more nucleic acid sequences or the development of other certain changes rare genetic. Cell-free DNA (cfDNA) has been known in the art for decades and may contain genetic aberrations associated with a particular disease. With improvements in sequencing and sets of procedures for manipulating nucleic acids, there is a need in the art for improved methods and systems for using cell-free DNA to detect and monitor a disease.
SUMMARY OF THE INVENTION [004] The disclosure provides a method for detecting a variation in copy number comprising: a) sequencing extracellular polynucleotides from a body sample from an individual, in which each of the extracellular polynucleotides is optionally linked to exclusive bars; b) filter readings that fail to meet a defined limit; c) map sequence readings obtained from step (a) to a reference sequence; d) quantify / count the mapped readings in two or more predefined regions of the reference sequence; e) determine a variation in the number of copies in one or more of the predefined regions by (i) normalizing the number of readings in the regions predefined among themselves and / or the number of unique bar codes in the regions predefined among themselves; and (ii) comparison of
Petition 870160049132, of 9/5/2016, p. 5/177
3/159 normalized numbers obtained in step (i) to normalized numbers obtained from a control sample.
[005] The disclosure also provides a method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual comprising: a) sequencing extracellular polynucleotides from a body sample from an individual, in which each of the extracellular polynucleotides generates a plurality of sequencing readings; b) sequencing the extracellular polynucleotides of a body sample from an individual, where each of the extracellular polynucleotides generates a plurality of sequencing readings; sequencing extracellular polynucleotides from a body sample from an individual, where each of the extracellular polynucleotides generates a plurality of sequencing readings; c) filter the readings that fail to reach a defined limit; d) map the sequence readings derived from the sequencing into a reference sequence; e) identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position; f) for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position; g) normalize the ratios or frequency of variance for each base mappable position and determine variant (s) or rare potential mutation (s); h) and compare the resulting number for each of the regions with variant (s) or
Petition 870160049132, of 9/5/2016, p. 6/177
4/159 rare potential mutation (s) to numbers similarly derived from a reference sample.
[006] Additionally, the disclosure also provides a method for characterizing the heterogeneity of an abnormal condition in an individual, the method comprising generating a genetic profile of extracellular polynucleotides in the individual, in which the genetic profile comprises a plurality of data resulting from the copy number variation and / or other rare mutation analyzes (eg, genetic alteration).
[007] In some modalities, the prevalence / concentration of each rare variant identified in the individual is reported and quantified simultaneously. In other modalities, a confidence score, in relation to the prevalence / concentrations of rare variants in the individual, is reported.
[008] In some embodiments, extracellular polynucleotides comprise DNA. In other embodiments, extracellular polynucleotides comprise RNA. Polynucleotides can be fragmented or fragmented after isolation. In addition, the disclosure provides a method for circulating nucleic acid extraction and isolation.
[009] In some embodiments, extracellular polynucleotides are isolated from a body sample that can be selected from a group consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, feces and tears.
[010] In some embodiments, the development methods also comprise a step of determining the percentage of strings that have copy number variation or other
Petition 870160049132, of 9/5/2016, p. 7/177
5/159 rare genetic alteration (for example, sequence variants) in said body sample.
[011] In some modalities, the percentage of sequences that vary in number of copies in said body sample is determined by calculating the percentage of predefined regions with an amount of polynucleotides above or below a predetermined limit.
[012] In some embodiments, body fluids are withdrawn from an individual on suspicion of having an abnormal condition that can be selected from the group consisting of rare mutations, unique nucleotide variants, indels, copy number variations, transversions , translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in chromosomal structure, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, changes abnormal changes in chemical nucleic acid changes, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid methylation infection and cancer.
[013] In some modalities, the individual by to be an pregnant woman in what is the condition not normal can to be an fetal abnormality selected a leave of the group what
consists of single, indelible nucleotide variants, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in chromosomal structure, gene fusions,
Petition 870160049132, of 9/5/2016, p. 8/177
6/159 chromosome, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical modifications of nucleic acid, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid methylation infection and cancer .
[014] In some embodiments, the method may comprise attaching one or more bar codes to extracellular polynucleotides or fragments thereof before sequencing, in which the bar codes are unique. In other embodiments, barcodes linked to extracellular polynucleotides or fragments of them before sequencing are not exclusive.
[015] In some embodiments, the methods of disclosure may comprise selectively enriching the individual's genome or transcriptome regions prior to sequencing. In other embodiments, the methods of disclosure comprise selectively enriching the individual's genome or transcriptome regions prior to sequencing. In other embodiments, the methods of disclosure comprise non-selectively enriching the individual's genome or transcriptome regions prior to sequencing.
[016] Furthermore, the methods of the development comprise attaching one or more bar codes to extracellular polynucleotides or fragments thereof before any stage of amplification or enrichment.
[017] In some embodiments, the barcode is a polynucleotide that can additionally comprise a random sequence or a fixed or semi-random set
Petition 870160049132, of 9/5/2016, p. 9/177
7/159 oligonucleotides that, in combination with the diversity of molecules sequenced from a selected region, allow the identification of unique molecules and have at least 3, 5, 10, 15, 20 25, 30, 35, 40, 45 or 50-mer base pairs in length.
[018] In some embodiments, extracellular polynucleotides or fragments thereof can be amplified. In some embodiments, amplification comprises global amplification or amplification of the entire genome.
[019] In some embodiments, the sequence readings of the unique identity can be detected based on the sequence information in the start (start) and end (stop) regions of the sequence reading and the length of the sequence reading. In other embodiments, the unique identity sequence molecules are detected based on the sequence information in the start (start) and end (stop) regions of the sequence reading, in the length of the sequence reading and connection of a bar code.
[020] In some modalities, amplification comprises selective amplification, non-selective amplification, amplification by suppression or subtractive enrichment.
[021] In some embodiments, methods of disclosure include removing a subset of the readings from further analysis before quantifying or enumerating the readings.
[022] In some modalities, the method may comprise filtering the readings with a score of precision or quality less than a limit, for example, 90%, 99%, 99.9% or 99.99%, and / or score less mapping than
Petition 870160049132, of 9/5/2016, p. 10/177
8/159 a limit, for example, 90%, 99%, 99.9% or 99, 99%. In other embodiments, the methods of disclosure comprise filtering readings with a quality score less than a defined threshold.
[023] In some modalities, the predefined regions
are uniforms or substantially uniforms in size, about 10 kb , 20 kb, 30 kb 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb or 100 kb in size. In some modalities, fur minus 50, 100 , 200, 500, 1,000 , 2,000, 5 .000, 10,000,
20,000 or 50,000 regions are analyzed.
[024] In some embodiments, a genetic variant, rare mutation or copy number variation occurs in a region of the genome selected from the group consisting of gene fusions, gene duplications, gene deletions, gene translocations, regions microsatellite, gene fragments or combination thereof. In other modalities, a genetic variant, rare mutation or copy number variation occurs in a region of the genome selected from the group consisting of genes, oncogenes, tumor suppressor genes, promoters, regulatory sequence elements or a combination thereof. In some embodiments, the variant is a nucleotide variant, single or small indel base substitution, transversion, translocation, inversion, deletion, truncation or truncation of the gene about 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 15 or 20 nucleotides in length.
[025] In some modalities, the method comprises correcting / normalizing / adjusting the number of readings mapped with the use of bar codes or exclusive properties of individual readings.
Petition 870160049132, of 9/5/2016, p. 11/177
9/159 [026] In some modalities, the enumeration of the readings is performed through the enumeration of the unique bar codes in each of the predefined regions and normalization of those numbers through at least a subset of the predefined regions that were sequenced. In some modalities, samples at successive time intervals from the same individual are analyzed and compared to previous sample results. The method of the development may further comprise determining the frequency of partial copy number variation, loss of heterozygosity, gene expression analysis, epigenetic analysis and hypermethylation analysis after the amplification of extracellular polynucleotides linked to bar codes.
[027] In some embodiments, analysis of copy number variation and rare mutation is determined in a cell-free or substantially cell-free sample obtained from an individual using multiplex sequencing, which comprises performing more than 10,000 sequencing reactions; simultaneously sequence at least 10,000 different readings; or perform data analysis on at least 10,000 different readings across the genome. The method can comprise multiplex sequencing which comprises performing data analysis on at least 10,000 different readings across the genome. The method may further comprise enumerating the sequenced readings that are uniquely identifiable.
[028] In some embodiments, the methods of disclosure comprise normalization and detection is performed using one or more of the hidden markov, programming
Petition 870160049132, of 9/5/2016, p. 12/177
10/159 dynamics, support vector machines, Bayesian network, trellis decoding, Viterbi decoding, expectation maximization, Kalman filtering or neural network methodologies.
[029] In some embodiments, methods of disclosure include monitoring disease progression, monitoring residual disease, monitoring therapy, diagnosing a condition, predicting a condition or selecting a therapy based on the revealed variants.
[030] In some modalities, a therapy is modified based on the most recent sample analysis. In addition, the methods of disclosure include inferring the genetic profile of a tumor, infection or other tissue abnormality. In some modalities, the growth, remission or evolution of a tumor, infection or other tissue abnormality is monitored. In some modalities, the individual's immune system is analyzed and monitored in single instances or over time.
[031] In some modalities, the methods of the development comprise the identification of a variant that is accompanied by an imaging test (for example, CT, PET-CT, MRI, X-rays, ultrasound) for the location of the abnormality of tissue suspected of causing the identified variant.
[032] In some embodiments, the methods of disclosure include the use of genetic data obtained from a biopsy of the patient's tissue or tumor. In some modalities, through which the phylogenetics of a tumor, infection or other tissue abnormality is inferred.
[033] In some embodiments, methods of disclosure
Petition 870160049132, of 9/5/2016, p. 13/177
11/159 comprise carrying out the non-call based on population and identification of low confidence regions. In some embodiments, obtaining measurement data for sequence coverage comprises measuring the depth of sequence coverage at each position in the genome. In some modalities, correcting the measurement data for the sequence coverage propensity involves calculating the coverage averaged by interval. In some modalities, correcting the measurement data for the sequence coverage propensity comprises making adjustments to consider the GC propensity in the library construction and sequencing process. In some modalities, correcting the measurement data for the sequence coverage propensity comprises making adjustments based on the additional weighting factor associated with the individual mappings to compensate for the propensity.
[034] In some embodiments, the development methods comprise the extracellular polynucleotide derived from a
origin of diseased cell. In some modalities, O extracellular polynucleotide is derivative of an origin in healthy cell. [035] The revelation too provides a system what
it comprises a computer-readable medium for performing the following steps: selecting the predefined regions in a genome; enumerate the number of sequence readings in the predefined regions; normalize the number of sequence readings across the predefined regions; and determine the percentage of copy number variation in the predefined regions. In some embodiments, the entire genome or at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%
Petition 870160049132, of 9/5/2016, p. 14/177
12/159 or 90% of the genome is analyzed. In some modalities, a computer-readable medium provides data on the percentage of cancer DNA or RNA in plasma or serum to the end user.
[036] In some embodiments, the amount of genetic variation such as polymorphisms or causal variants is analyzed. In some modalities, the presence or absence of genetic changes is detected.
[037] The disclosure also provides a method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual comprising: a) sequencing extracellular polynucleotides from a body sample from an individual, in which each of the extracellular polynucleotides generates a plurality of sequencing readings; b) filter the readings that fail to reach a defined limit; c) map the sequence readings derived from the sequencing into a reference sequence; d) identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position; e) for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position; f) normalize the ratios or frequency of variance for each base mappable position and determine potential rare variant (s) or other genetic alteration (s); and g) compare the resulting number for each of the regions.
[038] This disclosure also provides a method that comprises: a. provide at least one set of
Petition 870160049132, of 9/5/2016, p. 15/177
13/159 labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequencing a subset (including an appropriate subset) of the set of amplified progeny polynucleotides to produce a set of sequencing readings; and d. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides. In certain embodiments, the method additionally comprises: e. analyze the set of consensus sequences for each set of labeled parent molecules.
[039] In some embodiments, each polynucleotide in a set is mappable to a reference sequence.
[040] In some embodiments, the method comprises providing a plurality of sets of labeled parent polynucleotides, with each set mapping to a different reference sequence.
[041] In some embodiments, the method further comprises converting the starting genetic material into the labeled parent polynucleotides.
[042] In some embodiments, the initial genetic material comprises no more than 100 ng of polynucleotides.
[043] In some embodiments, the method comprises restricting the genetic material of initial departure before
Petition 870160049132, of 9/5/2016, p. 16/177
14/159 conversion.
[044] In some embodiments, the method comprises converting the starting genetic material into parent polynucleotides labeled with a conversion efficiency of at least 10%, at least 20%, at least 30%, at least 40%, at least 50 %, at least 60%, at least 80% or at least 90%.
[045] In some embodiments, the conversion comprises any of the blunt-ended bond, sticky-end bond, molecular inversion probes, PCR, bond-based PCR, single-filament bond and single-filament circularization.
[046] In some embodiments, the starting genetic material is a cell-free nucleic acid.
[047] In some embodiments, a plurality of reference sequences are from the same genome.
[048] In some embodiments, each parent polynucleotide labeled in the set is uniquely labeled.
[049] In some modalities, the labels are non-exclusive.
[050] In some embodiments, the generation of consensus strings is based on the tag information and / or at least one of the sequence information in the beginning region
(start) gives reading in sequence, in final regions (stop) gives reading in sequence and in the length of sequence reading. [051] In some modalities, O method comprises
sequence a subset of the set of amplified progeny polynucleotides sufficient to produce the
Petition 870160049132, of 9/5/2016, p. 17/177
15/159
sequence readings for fur any less an progeny of each one of fur any less 20%, fur any less 30%, fur any less 40%, at least 50%, fur any less 60%, fur any less 70%, fur any less 80%, at any less 90% fur any less 95%, fur any less 98%, fur
at least 99%, at least 99.9% or at least 99.99% of the exclusive polynucleotides in the set of labeled parent polynucleotides.
[052] In some embodiments, at least one progeny is a plurality of progenies, for example, at least 2, at least 5 or at least 10 progenies.
[053] In some embodiments, the number of sequence readings in the sequence reading set is greater than the number of unique labeled parent polynucleotides in the set of labeled parent polynucleotides.
[054] In some embodiments, the subset of the set of sequenced amplified progeny polynucleotides is sufficiently large so that any nucleotide sequence represented in the set of progenitor polynucleotides tagged at a percentage that is the same as the percentage of sequencing error rate by base of the sequencing platform used has at least
an chance in 50%, fur any less an chance in 60%, fur any less an chance in 70%, fur any less an chance in 80%, fur any less an chance in 90% fur any less an chance in 95%, fur any less an chance in 98%, fur any less an chance in 99%, fur any less an chance in 99, 9% or fur least a chance to 99, ' 99% of
be represented within the set of consensus strings.
[055] In some modalities, the method comprises enriching the set of progeny polynucleotides
Petition 870160049132, of 9/5/2016, p. 18/177
16/159 amplified for mapping polynucleotides to one or more reference sequences selected by: (i) selective amplification of sequences of the initial starting genetic material converted into labeled parent polynucleotides; (ii) selective amplification of labeled parent polynucleotides; (iii) selective sequence capture of amplified progeny polynucleotides; or (iv) capture of selective sequence of initial starting genetic material.
[056] In some modalities, the analysis comprises normalizing a measurement (for example, number) taken from a set of consensus sequences versus a measurement taken from a set of consensus sequences in a control sample.
[057] In some modalities, the analysis comprises detecting rare mutations, single nucleotide variants, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in chromosome structure, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical modifications of nucleic acid, abnormal changes in epigenetic patterns,
abnormal changes in infection of met acidation nucleic or cancer. [058] In some modalities, the polynucleotides comprise DNA, RNA, a combination From two or more DNA RNA-derived cDNA. [059] In some modalities r a certain
Petition 870160049132, of 9/5/2016, p. 19/177
17/159 subset of polynucleotides is selected for or is enriched based on the length of polynucleotides in base pairs from the initial set of polynucleotides or from the amplified polynucleotides.
[060] In some modalities, the analysis additionally includes the detection and monitoring of an abnormality or disease within an individual, such as infection and / or cancer.
[061] In some modalities, the method is performed in combination with the definition of the immune repertoire profile.
[062] In some modalities, polynucleotides are extracted from the group consisting of blood, plasma, serum, urine, saliva, mucous excretions, sputum, feces and tears.
[063] In some modalities, the collection comprises detecting and / or correcting errors, cuts or lesions present in the sense or antisense filament of the labeled parent polynucleotides or amplified progeny polynucleotides.
[064] This disclosure also provides a method which comprises detecting a genetic variation in the starting genetic material with a sensitivity of at least 5%, at least 1%, at least 0.5%, at least 0.1% or at least minus 0.05%. In some embodiments, the initial starting genetic material is supplied in an amount less than 100 ng of nucleic acid, the genetic variation is a variation in the number of copy / heterozygosity and the detection is performed with subchromosomal resolution; per
Petition 870160049132, of 9/5/2016, p. 20/177
18/159 example, resolution of at least 100 megabases, resolution of at least 10 megabases, resolution of at least 1 megabase, resolution of at least 100 kilobases, resolution of at least 10 kilobases or resolution of at least 1 kilobase. In another embodiment, the method comprises providing a plurality of sets of labeled parent polynucleotides, each set mapping to a different reference sequence. In another embodiment, the reference sequence is the locus of a tumor marker and the analysis comprises detecting the tumor marker in the set of consensus sequences. In another modality, the tumor marker is present in the set of consensus sequences at a frequency lower than the error rate introduced in the amplification step. In another embodiment, the at least one set is a plurality of sets and the reference sequences comprise a plurality of reference sequences, each of which is the locus of a tumor marker. In another modality, the analysis comprises detecting the variation in the number of copies of the consensus sequences between at least two sets of parent polynucleotides. In another modality, the analysis comprises detecting the presence of sequence variations in comparison to the reference sequences. In another modality, the analysis comprises detecting the presence of sequence variations in comparison to the reference sequences and detecting the variation in copy number of the consensus sequences between at least two sets of parent polynucleotides. In another modality, payment includes: i. group sequenced readings from progeny polynucleotides
Petition 870160049132, of 9/5/2016, p. 21/177
19/159 amplified in families, with each family being amplified from the same labeled parent polynucleotide; and ii. Determine a consensus sequence based on sequence readings in a family.
[065] This disclosure also provides a system that comprises a computer-readable means for performing the following steps: a. provide at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequencing a subset (including an appropriate subset) of the set of amplified progeny polynucleotides to produce a set of sequencing readings; and d. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides and, optionally, e. analyze the set of consensus sequences for each set of labeled parent molecules.
[066] This disclosure also provides a method that comprises: a. provide at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequence a subset (including an appropriate subset) of the
Petition 870160049132, of 9/5/2016, p. 22/177
20/159 set of amplified progeny polynucleotides, to produce a set of sequencing readings; d. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides; and is. filter among consensus strings those that fail to reach a quality limit. In one embodiment, the quality limit considers a number of sequence readings from the amplified progeny polynucleotides collected in a consensus sequence. In another modality, the quality limit considers a number of sequence readings of the amplified progeny polynucleotides collected in a consensus sequence. This disclosure also provides a system that comprises a computer-readable means for carrying out the aforementioned method.
[067] This disclosure also provides a method that comprises: a. providing at least one set of labeled parent polynucleotides, each set mapping to a different reference sequence in one or more genomes and for each set of labeled parent polynucleotides; i. amplify the first polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the amplified polynucleotide set, to produce a set of sequencing readings; and iii. collect sequence readings: 1. grouping sequenced sequence readings from the amplified progeny polynucleotides into families, being
Petition 870160049132, of 9/5/2016, p. 23/177
21/159 that each family is amplified from the same labeled parent polynucleotide. In one embodiment, the collection additionally comprises: 2. determining a quantitative measurement of the sequence readings in each family. In another embodiment, the method further comprises (including a): b. determine a quantitative measurement of exclusive families; and c. based on (1) the quantitative measurement of exclusive families and (2) the quantitative measurement of sequence readings in each group, infer a measurement of unique labeled parent polynucleotides in the set. In another modality, the inference is made using statistical and probabilistic models. In another embodiment where the at least one set is a plurality of sets. In another modality, the method additionally comprises correcting the propensity for representation or amplification between the two sets. In another embodiment, the method additionally comprises using a control or set of control samples to correct representational or amplification biases between the two sets. In another modality, the method additionally comprises determining the variation in the number of copies between the sets. In another embodiment, the method further comprises (including a, b, c): d. determine a quantitative measurement of polymorphic forms among families; and is. based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of polymorphic forms in the number of inferred unique labeled parent polynucleotides. In another modality in which polymorphic forms include, but are not limited to: substitutions,
Petition 870160049132, of 9/5/2016, p. 24/177
22/159 insertions, deletions, inversions, microsatellite changes, transversions, translocations, fusions, methylation, hypermethylation, hydroxymethylation, acetylation, epigenetic variant, associated regulatory variants or protein binding sites. In another modality in which the sets are derived from a common sample, the method additionally comprises: a. to infer the variation in the number of copies for the plurality of sets based on a comparison of the inferred number of labeled parent polynucleotides in each set mapping each one among a plurality of reference sequences. In another embodiment, the original number of polynucleotides in each set is additionally inferred. This disclosure also provides a system that comprises a computer-readable means for carrying out the previously mentioned methods.
[068] This disclosure also provides a method for determining copy number variation in a sample that includes polynucleotides, the method comprising: a. provide at least two sets of the first polynucleotides, where each set maps to a reference sequence difference in a genome and for each set of the first polynucleotides; i. amplify the polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the amplified polynucleotide set to produce a set of sequencing readings; iii. group sequenced sequence readings from polynucleotides amplified in families, with each family being amplified from the first
Petition 870160049132, of 9/5/2016, p. 25/177
23/159 polynucleotide in the set; iv. infer a quantitative measurement of families as a whole; B. determine the variation in the number of copies by comparing the quantitative measurement of families in each set. This disclosure also provides a system that comprises a computer-readable means for carrying out the previously mentioned methods.
[069] This disclosure also provides a method for inferring the frequency of sequence calls in a polynucleotide sample that comprises: a. provide at least one set of the first polynucleotides, where each set maps to a different reference sequence in one or more genomes and for each set of the first polynucleotides; i. amplify the first polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the amplified polynucleotide set to produce a set of sequencing readings; iii. group the sequence readings into families, with each family comprising the sequence readings of the polynucleotides amplified from the same first polynucleotide; B. infer, for each set of first polynucleotides, a call frequency for one or more bases in the set of first polynucleotides, where the inference comprises: i. designate, for each family, the confidence score for each of a plurality of calls, with the confidence score taking into account a frequency of the call among family members; and ii. estimate a frequency of one or more calls that take into account the scores of
Petition 870160049132, of 9/5/2016, p. 26/177
24/159 confidence of one or more calls designated for each family. This disclosure also provides a system that comprises a computer-readable means for carrying out the previously mentioned methods.
[070] This disclosure also provides a method for communicating sequence information about at least one individual polynucleotide molecule comprising: a. providing at least one individual polynucleotide molecule; B. encoding the sequence information into at least one individual polynucleotide molecule to produce a signal; ç. passing at least part of the signal through a channel to produce a received signal comprising nucleotide sequence information about the at least one individual polynucleotide molecule, wherein the received signal comprises noise and / or distortion; d. decoding the received signal to produce a message comprising sequence information about at least one individual polynucleotide molecule, in which decoding reduces noise and / or distortion in the message; and is. provide the message to a recipient. In one embodiment, the noise comprises incorrect nucleotide calls. In another embodiment, the distortion comprises an uneven amplification of the individual polynucleotide molecule compared to other individual polynucleotide molecules. In another mode, the distortion results from a propensity for amplification or sequencing. In another embodiment, the at least one individual polynucleotide molecule is a plurality of individual polynucleotide molecules and decoding produces a message about each molecule in the
Petition 870160049132, of 9/5/2016, p. 27/177
Plurality. In another embodiment, the encoding comprises amplifying at least one individual polynucleotide molecule that has been optionally labeled, wherein the signal comprises a collection of amplified molecules. In another embodiment, the channel comprises a polynucleotide sequencer and the received signal comprises sequence readings from a plurality of polynucleotides amplified from at least one individual polynucleotide molecule. In another embodiment, decoding comprises grouping the sequence readings of the amplified molecules from each of the at least one individual polynucleotide molecule. In another modality, decoding consists of a probabilistic or statistical method to filter the generated sequence signal. This disclosure also provides a system that comprises a computer-readable means for carrying out the methods mentioned above.
[071] In another embodiment, polynucleotides are derived from tumor genomic DNA or RNA. In another embodiment, polynucleotides are derived from cell-free polynucleotides, exosomal polynucleotides, bacterial polynucleotides or viral polynucleotides. Another modality additionally comprises the detection and / or association of the affected molecular trajectories. Another modality additionally comprises the serial monitoring of an individual's state of health or illness. In another embodiment, the phylogeny of a genome associated with a disease within an individual is inferred. Another modality additionally comprises the diagnosis, monitoring or treatment of a disease. In
Petition 870160049132, of 9/5/2016, p. 28/177
26/159 another modality, the treatment regime is selected or modified based on the polymorphic forms detected or CNVs or associated trajectories. In another embodiment, the treatment comprises a combination therapy.
[072] This disclosure also provides a computer-readable medium in tangible non-transitory form that comprises an executable code configured to perform the following steps: selecting the predefined regions in a genome; access sequence readings and list the number of sequence readings in the predefined regions; normalize the number of sequence readings across the predefined regions; and determine the percentage of copy number variation in the predefined regions.
[073] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. access a data file that comprises a plurality of sequencing readings; B. filter readings that fail to reach a defined limit; ç. map the sequence readings derived from the sequencing into a reference sequence; d. identify a subset of mapped sequence readings that align with a reference sequence variant at each mappable base position; and. for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position; f. normalize the ratios or frequency of variance for each base mappable position and determine rare variant (s)
Petition 870160049132, of 9/5/2016, p. 29/177
27/159 potential (s) or other genetic alteration (s); and g. compare the resulting number for each of the regions with rare potential variant (s) or mutation (s) to numbers similarly derived from a reference sample.
[074] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. accessing a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; B. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides.
[075] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. accessing a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; B. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides; ç. filter
Petition 870160049132, of 9/5/2016, p. 30/177
28/159 among the consensus strings to those that fail to reach a quality limit.
[076] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. accessing a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; Hey. collect sequence readings: 1. grouping sequenced sequence readings from the progeny polynucleotides amplified into families, with each family being amplified from the same labeled parent polynucleotide and, optionally, 2. Determining a quantitative measurement sequence readings in each family. In certain embodiments, the executable code additionally performs the steps of: b. determine a quantitative measurement of exclusive families; ç. based on (1) the quantitative measurement of exclusive families and (2) the quantitative measurement of sequence readings in each group, infer a measurement of unique labeled parent polynucleotides in the set. In certain embodiments, the executable code additionally performs the steps of: d. determine a quantitative measurement of polymorphic forms among families; and is. based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of the polymorphic forms in the number of inferred exclusive labeled parent polynucleotides.
Petition 870160049132, of 9/5/2016, p. 31/177
29/159 [077] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises an executable code configured to perform the following steps: a. access a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides grouping sequenced sequence readings from amplified polynucleotides in families, with each family being amplified from the same first polynucleotide in the set; B. infer a quantitative measurement of families as a whole; ç. determine the variation in the number of copies by comparing the quantitative measurement of families in each set.
[078] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. access a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides grouping the sequence readings into families, each of which family comprises the sequence readings of the polynucleotides amplified from the same first polynucleotide; B. infer, for each set of first polynucleotides, a call frequency for one or more bases in the set of first polynucleotides, where the inference comprises: c.
Petition 870160049132, of 9/5/2016, p. 32/177
30/159 designate, for each family, the confidence score for each of a plurality of calls, with the confidence score taking into account a frequency of the call among family members; and d. estimate the frequency of one or more calls that take into account the confidence scores of the one or more calls assigned to each family.
[079] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises executable code configured to perform the following steps: a. accessing a data file comprising a received signal comprising sequence information encoded from at least one individual polynucleotide molecule in which the received signal comprises noise and / or distortion; B. decoding the received signal to produce a message comprising sequence information about at least one individual polynucleotide molecule, wherein the decoding reduces noise and / or distortion over each individual polynucleotide in the message; and c. write the message that comprises the sequence information about at least one individual polynucleotide molecule in a computer file.
[080] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises the executable code configured to perform the following steps: a. access a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of parent polynucleotides
Petition 870160049132, of 9/5/2016, p. 33/177
31/159 tagged; B. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynudeotide among the set of labeled parent polynucleotides; ç. filter among consensus strings to those that fail to reach a quality limit.
[081] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises the executable code configured to perform the following steps: a. accessing a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; and b. collect the sequence readings: i. grouping the sequence readings sequenced from the parent polynucleotides amplified into families, with each family being amplified from the same labeled parent polynudeotide; and ii. optionally, determine a quantitative measurement of the sequence readings in each family. In certain embodiments, the executable code additionally performs the steps of: c. determine a quantitative measurement of exclusive families; d. based on (1) the quantitative measurement of the exclusive families and (2) the quantitative measurement of the sequence readings in each group, infer a measurement of the unique labeled parent polynucleotides in the set. In certain embodiments, the executable code additionally performs the steps of: e. determine a
Petition 870160049132, of 9/5/2016, p. 34/177
32/159 quantitative measurement of polymorphic forms among families; and f. based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of the polymorphic forms in the number of inferred exclusive labeled parent polynucleotides. In certain embodiments, the executable code additionally performs the steps of: e. infer the variation in the number of copies for the plurality of sets based on a comparison of the inferred number of labeled parent polynucleotides in each set that maps to each one among a plurality of reference sequences.
[082] This disclosure also provides a computer readable medium in a non-transitory tangible form that comprises the executable code configured to perform the following steps: a. accessing a data file comprising a plurality of sequencing readings, where the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; B. group the sequence readings sequenced from the polynucleotides amplified in families, with each family being amplified from the same first polynucleotide in the set; ç. infer a quantitative measurement of families in the group; d. determine the variation in the number of copies by comparing the quantitative measurement of families in each set.
[083] This disclosure also provides a computer-readable medium in a non-transitory tangible form that comprises the executable code configured to perform the following steps: a. access a data file that comprises a
Petition 870160049132, of 9/5/2016, p. 35/177
33/159 plurality of sequencing readings, in which the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides grouping the sequence readings into families, with each family comprising the readings of sequence of polynucleotides amplified from the same first polynucleotide; and b. infer, for each set of first polynucleotides, a call frequency for one or more bases in the set of first polynucleotides, where the inference comprises: i. designate, for each family, the confidence score for each of a plurality of calls, with the confidence score taking into account a frequency of the call within the family members; and ii. estimate the frequency of one or more calls that take into account the confidence scores of the one or more calls assigned to each family.
[084] This disclosure also provides a method that comprises: a. provide a sample comprising between 100 and 100,000 human haploid genome equivalents of cell-free DNA polynucleotides (cfDNA); and b. tag polynucleotides with between 2 and 1,000,000 unique identifiers. In certain modalities, the number of unique identifiers is at least 3, at least 5, at least 10, at least 15 or at least 25 and at most 100, at most 1,000 or at most 10,000. In certain modalities, the number of unique identifiers is a maximum of 100, a maximum of 1,000, a maximum of 10,000, a maximum of 100,000.
Petition 870160049132, of 9/5/2016, p. 36/177
34/159
[085] This revelation also provides one method what comprises: The. provide a sample what understands an plurality in genome equivalents human haploid in
fragmented polynucleotides; B. determining z, where z is a measure of the central tendency (for example, mean, median or mode) of the expected number of duplicate polynucleotides starting at any position in the genome, where the duplicate polynucleotides have the same start and stop positions; and c. label the polynucleotides in the sample with n unique identifiers, where n is between 2 and 100,000 * z, 2 and 10,000 * z, 2 and 1,000 * z or 2 and
100 * z.
[086] This disclosure also provides a method that comprises: a. provide at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. producing a plurality of sequence readings for each parent polynucleotide labeled in the set to produce a set of sequencing readings; and c.
collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides.
[087] The disclosure provides a method for detecting a variation in copy number comprising: a) sequencing the extracellular polynucleotides of a body sample from an individual, where each of the extracellular polynucleotides generates a plurality of sequencing readings; b) filter readings that fail to
Petition 870160049132, of 9/5/2016, p. 37/177
35/159 reach a defined limit; c) map the sequence readings obtained from step (a), after the readings are filtered, to a reference sequence; d) quantify or enumerate the mapped readings in two or more predefined regions of the reference sequence; and e) determine the variation in the number of copies in one or more of the predefined regions: (ii) normalizing the number of readings in the predefined regions among themselves and / or the number of exclusive sequence readings in the regions predefined among themselves; (ii) comparing the normalized numbers obtained in step (i) to the normalized numbers obtained from a control sample.
[088] The disclosure also provides a method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual comprising: a) sequencing extracellular polynucleotides from a body sample from an individual, in which each of the extracellular polynucleotides generates a plurality of sequencing readings; b) perform multiplex sequencing in regions or entire genome sequencing if enrichment is not performed; c) filter the readings that fail to reach a defined limit; d) map the sequence readings derived from the sequencing into a reference sequence; e) identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position; f) for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a
Petition 870160049132, of 9/5/2016, p. 38/177
36/159 number of total sequence readings for each base mappable position; g) normalize the ratios or frequency of variance for each base mappable position and determine potential variant (s) potential (s) or mutation (s); and h) comparing the resulting number for each of the regions with a rare potential variant (s) or mutation (s) to numbers similarly derived from a reference sample.
[089] The disclosure also provides a method to characterize the heterogeneity of an abnormal condition in an individual, the method comprising generating a genetic profile of extracellular polynucleotides in the individual, in which the genetic profile comprises a plurality of data resulting from the variation of copy number and rare mutation analyzes.
[090] In some modalities, the prevalence / concentration of each rare variant identified in the individual is reported and quantified simultaneously. In some modalities, a confidence score, in relation to the prevalence / concentrations of rare variants in the individual, is reported.
[091] In some embodiments, extracellular polynucleotides comprise DNA. In some embodiments, extracellular polynucleotides comprise RNA.
[092] In some embodiments, the methods additionally comprise isolating extracellular polynucleotides from the body sample. In some embodiments, the isolation comprises a method for circulating nucleic acid extraction and isolation. In some embodiments, the methods additionally comprise fragmenting said isolated extracellular polynucleotides. In some
Petition 870160049132, of 9/5/2016, p. 39/177
37/159 modalities, the body sample is selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, feces and tears.
[093] In some modalities, the methods additionally comprise the step of determining the percentage of the sequences that have a variation of copy number or rare mutation or variant in said body sample. In some embodiments, the determination comprises calculating the percentage of the predefined regions with an amount of polynucleotides above or below a predetermined limit.
[094] In some modalities, it is suspected that the individual has an abnormal condition. In some modalities, the abnormal condition is selected from the group consisting of mutations, rare mutations, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in structure chromosomal, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical nucleic acid changes, abnormal changes in epigenetic patterns, abnormal changes in methylation infection nucleic acid and cancer.
[095] In some modalities, the individual is a pregnant woman. In some modalities, the variation in copy number or rare mutation or genetic variant is indicative of a fetal abnormality. In some modalities, fetal abnormality is selected from the group that
Petition 870160049132, of 9/5/2016, p. 40/177
38/159 consists of mutations, rare mutations, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in chromosomal structure, gene fusions, chromosome fusions, truncations of gene, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical modifications of nucleic acid, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid methylation infection and cancer.
[096] In some embodiments, the methods additionally comprise attaching one or more bar codes to extracellular polynucleotides or fragments thereof before sequencing. In some embodiments, each bar code linked to extracellular polynucleotides or fragments of them before sequencing is unique. In some embodiments, each bar code linked to extracellular polynucleotides or fragments of them before sequencing is not unique.
[097] In some embodiments, the methods additionally comprise selectively enriching the individual's genome or transcriptome regions prior to sequencing. In some embodiments, the methods additionally comprise non-selectively enriching the individual's genome or transcriptome regions prior to sequencing.
[098] In some embodiments, the methods additionally comprise attaching one or more bar codes to extracellular polynucleotides or fragments thereof
Petition 870160049132, of 9/5/2016, p. 41/177
39/159 before any stage of amplification or enrichment. In some embodiments, the barcode is a polynucleotide. In some embodiments, the bar code comprises a random sequence. In some modalities, the bar code comprises a fixed or semi-random set of oligonucleotides that, in combination with the diversity of molecules sequenced from a selected region, allows the identification of unique molecules. In some embodiments, the bar codes comprise oligonucleotides that are at least 3, 5, 10, 15, 20 25, 30, 35, 40, 45 or 50-mer base pairs long.
[099] In some embodiments, the methods additionally comprise amplifying extracellular polynucleotides or fragments thereof. In some embodiments, amplification comprises global amplification or amplification of the entire genome. In some embodiments, amplification comprises selective amplification. In some embodiments, amplification comprises non-selective amplification. In some modalities, amplification by suppression or subtractive enrichment is performed.
[100] In some embodiments, the sequence readings of the unique identity are detected based on the sequence information in the start (start) and end (stop) regions of the sequence reading and the length of the sequence reading. In some embodiments, the sequence molecules of the unique identity are detected based on the sequence information in the start (start) and end (stop) regions of the sequence reading, in the
Petition 870160049132, of 9/5/2016, p. 42/177
40/159 length of the sequence reading and connection of a bar code.
[101] In some embodiments, the methods additionally comprise removing a subset of the readings from further analysis before quantifying or enumerating the readings. In some embodiments, removal involves filtering the readings with an accuracy or quality score less than a threshold, for example, 90%, 99%, 99.9% or 99.99% and / or mapping score less than a limit, for example, 90%, 99%, 99.9% or 99, 99%. In some modalities, the methods additionally comprise filtering the readings with a quality score less than a defined limit.
[102] In some embodiments, the predefined regions are uniform or substantially uniform in size. In some embodiments, the predefined regions are at least about 10 kb, 20 kb, 30 kb 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb or 100 kb in size.
[103] In some modalities, at least 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000 or 50,000 regions are analyzed.
[104] In some embodiments, the variant occurs in a region of the genome selected from the group consisting of gene fusions, gene duplications, gene deletions, gene translocations, microsatellite regions, gene fragments or combinations thereof . In some embodiments, the variant occurs in a region of the genome selected from the group consisting of genes, oncogenes, tumor suppressor genes, promoters, elements of regulatory sequence or combination of
Petition 870160049132, of 9/5/2016, p. 43/177
41/159 same. In some embodiments, the variant is a nucleotide variant, single or small indel base substitution, transversion, translocation, inversion, deletion, truncation or truncation of the gene about 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 15 or 20 nucleotides in length.
[105] In some modalities, the methods additionally comprise correcting / normalizing / adjusting the number of mapped readings using bar codes or unique properties of individual readings. In some modalities, the enumeration of the readings is performed through the enumeration of the unique bar codes in each of the predefined regions and normalization of those numbers through at least a subset of the predefined regions that have been sequenced.
[106] In some embodiments, samples at successive time intervals from the same individual are analyzed and compared to previous sample results. In some embodiments, the method further comprises amplifying the extracellular polynucleotides linked to the barcode. In some embodiments, the methods additionally comprise determining the frequency of partial copy number variation, determining the loss of heterozygosity, performing the gene expression analysis, performing the epigenetic analysis and / or performing the hypermethylation analysis.
[107] The disclosure also provides a method that comprises determining the copy number variation or performing the rare mutation analysis on a cell-free or substantially cell-free sample obtained from an individual using sequencing
Petition 870160049132, of 9/5/2016, p. 44/177
42/159 multiplex.
[108] In some embodiments, multiplex sequencing comprises performing more than 10,000 sequencing reactions. In some embodiments, multiplex sequencing comprises sequencing at least 10,000 different readings simultaneously. In some modalities, multiplex sequencing involves performing data analysis on at least 10,000 different readings across the genome. In some modalities, normalization and detection are performed using one or more of hidden markov, dynamic programming, support vector machines, probabilistic or Bayesian modeling, trellis decoding, Viterbi decoding, expectation maximization, Kalman filtering or neural network methodologies. In some modalities, the methods additionally include monitoring disease progression, monitoring residual disease, monitoring therapy, diagnosing a condition, predicting a condition or selecting a therapy based on the variants revealed to the individual. In some modalities, a therapy is modified based on the most recent sample analysis. In some embodiments, the genetic profile of a tumor, infection or other tissue abnormality is inferred.
[109] In some modalities, the growth, remission or evolution of a tumor, infection or other tissue abnormality is monitored. In some modalities, the sequences related to the individual's immune system are analyzed and monitored in single instances or over time. In some modalities, the identification of a variant is accompanied by an imaging test
Petition 870160049132, of 9/5/2016, p. 45/177
43/159 (for example, CT, PET-CT, MRI, X-rays, ultrasound) for the location of the tissue abnormality suspected to cause the identified variant. In some modalities, the analysis additionally includes the use of genetic data obtained from a tissue or tumor biopsy of the same patient. In some embodiments, the phylogenetics of a tumor, infection or other tissue abnormality is inferred. In some modalities, the method additionally comprises performing the non-call based on population and identification of low confidence regions. In some embodiments, obtaining measurement data for sequence coverage comprises measuring the depth of sequence coverage at each position in the genome. In some modalities, correcting the measurement data for the sequence coverage propensity involves calculating the coverage averaged by interval. In some modalities, correcting the measurement data for the sequence coverage propensity comprises making adjustments to consider the GC propensity in the library construction and sequencing process. In some modalities, correcting the measurement data for the sequence coverage propensity comprises making adjustments based on the additional weighting factor associated with the individual mappings to compensate for the propensity.
[110] In some embodiments, the extracellular polynucleotide is derived from a diseased cell source. In some embodiments, the extracellular polynucleotide is derived from a healthy cell source.
[111] The revelation also provides a system that comprises a computer-readable medium for carrying out the
Petition 870160049132, of 9/5/2016, p. 46/177
44/159 steps to follow: select the predefined regions in a genome; enumerate the number of sequence readings in the predefined regions; normalize the number of sequence readings across the predefined regions; and determine the percentage of copy number variation in the predefined regions.
[112] In some embodiments, the entire genome or at least 85% of the genome is analyzed. In some embodiments, the computer-readable medium provides data on the percentage of cancer DNA or RNA in plasma or serum to the end user. In some embodiments, the number of copy variants identified are fractional (ie, non-integer levels) due to the heterogeneity in the sample. In some modalities, the enrichment of the selected regions is carried out. In some embodiments, copy number variation information is extracted simultaneously based on the methods described in this document. In some embodiments, the methods comprise an initial polynudeotide restriction step to limit the number of initial starting copies or diversity of the polynucleotides in the sample.
[113] The disclosure also provides a method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual comprising: a) sequencing extracellular polynucleotides from an individual's body sample, in which each of the extracellular polynucleotides generates a plurality of sequencing readings; b) filter the readings that fail to reach a defined quality limit; c) map the sequence readings derived from the
Petition 870160049132, of 9/5/2016, p. 47/177
45/159 sequencing in a reference sequence; d) identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position; e) for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position; f) normalize the ratios or frequency of variance for each base mappable position and determine potential rare variant (s) or other genetic alteration (s); and g) comparing the resulting number for each of the regions with a rare potential variant (s) or mutation (s) to numbers similarly derived from a reference sample.
[114] The revelation also provides a method that comprises: a. provide at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequencing a subset (including an appropriate subset) of the set of amplified progeny polynucleotides to produce a set of sequencing readings; and d.
collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides.
Petition 870160049132, of 9/5/2016, p. 48/177
46/159 [115] In some embodiments, each polynucleotide in a set is mappable to a reference sequence. In some embodiments, the methods comprise providing a plurality of sets of labeled parent polynucleotides, each set mapping to a different mappable position in the reference sequence. In some embodiments, the method further comprises: e) analyzing the set of consensus sequences for each set of progenitor molecules labeled separately or in combination. In some embodiments, the method further comprises converting the initial starting genetic material into the labeled parent polynucleotides. In some embodiments, the initial starting genetic material comprises no more than 100 ng of polynucleotides. In some embodiments, the method comprises restricting the genetic material from initial starting before conversion. In some embodiments, the method comprises converting the initial starting genetic material into parent polynucleotides labeled with a conversion efficiency of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least least 60%, at least 80% or at least 90%. In some embodiments, the conversion comprises any of the blunt-ended bond, sticky-end bond, molecular inversion probes, PCR, bond-based PCR, single-filament bond and single-filament circularization. In some embodiments, the starting genetic material is a cell-free nucleic acid. In some embodiments, a plurality of sets maps to different mappable positions
Petition 870160049132, of 9/5/2016, p. 49/177
47/159 in a reference sequence from the same genome.
[116] In some embodiments, each parent polynucleotide labeled in the set is uniquely labeled. In some embodiments, each set of parent polynucleotides is mapped to a position in a reference sequence and the polynucleotides in each set are not exclusively labeled. In some modalities, the generation of consensus sequences is based on the information on the tag and / or at least one among (i) the sequence information in the beginning (beginning) region of the sequence reading, (ii) in the final regions. (stop) of the sequence reading and (iii) the length of the sequence reading.
[117] In some embodiments, the method comprises sequencing a subset of the set of amplified progeny polynucleotides sufficient to produce the sequence readings for at least one progeny from each of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% at least 95%, at least 98%, at least 99%, at least 99,9% or at least 99, 99% of the exclusive polynucleotides in the set of labeled parent polynucleotides. In some embodiments, the at least one progeny is a plurality of progenies, for example, at least 2, at least 5 or at least 10 progenies. In some embodiments, the number of sequence readings in the sequence reading set is greater than the number of unique labeled parent polynucleotides in the set of labeled parent polynucleotides. In some modalities, the subset of the
Petition 870160049132, of 9/5/2016, p. 50/177
48/159 set of sequenced amplified progeny polynucleotides is sufficiently large so that any nucleotide sequence represented in the set of progenitor polynucleotides tagged at a percentage that is the same as the percentage of sequencing error rate based on the sequencing platform used has at least a 50% chance, at least one
chance in 60%, fur any less an chance in 70%, fur any less an chance in 80%, fur any less an chance in 90% fur any less an chance in 95%, fur any less an chance in 98%, fur any less an chance in 99%, fur any less an chance in 99, 9 % or at least
a 99.99% chance of being represented within the consensus set of strings.
[118] In some modalities, the method comprises enriching the set of amplified progeny polynucleotides for mapping polynucleotides to one or more mappable positions selected in a reference sequence by: (i) selective amplification of sequences of the converted initial genetic material in labeled parent polynucleotides; (ii) selective amplification of labeled parent polynucleotides; (iii) selective sequence capture of amplified progeny polynucleotides; or (iv) capture of selective sequence of initial starting genetic material.
[119] In some embodiments, the analysis comprises normalizing a measurement (for example, number) taken from a set of consensus sequences versus a measurement taken from a set of consensus sequences in a control sample. In some
Petition 870160049132, of 9/5/2016, p. 51/177
49/159 modalities, the analysis comprises detecting mutations, rare mutations, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, changes in chromosomal structure, gene fusions, fusions chromosome, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical modifications of nucleic acid, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid methylation infection or cancer.
[120] In some embodiments, polynucleotides comprise DNA, RNA, a combination of the two or DNA plus RNA-derived cDNA. In some embodiments, a particular subset of polynucleotides is selected for or enriched based on the length of base pair polynucleotides from the initial set of polynucleotides or from the amplified polynucleotides. In some modalities, the analysis additionally comprises the detection and monitoring of an abnormality or disease within an individual, such as infection and / or cancer. In some modalities, the method is performed in combination with the definition of the immune repertoire profile. In some modalities, polynucleotides are extracted from a body sample selected from the group consisting of blood, plasma, serum, urine, saliva, mucous excretions, sputum, feces and tears. In some modalities, the collection comprises detecting and / or correcting errors,
Petition 870160049132, of 9/5/2016, p. 52/177
50/159 cuts or lesions present in the sense or antisense filament of the labeled parent polynucleotides or amplified progeny polynucleotides.
[121] The disclosure also provides a method comprising detecting a genetic variation in the initial starting genetic material labeled not exclusively with a sensitivity of at least 5%, at least 1%, at least 0.5%, at least 0.1 % or at least 0.05%.
[122] In some embodiments, the initial starting genetic material is supplied in less than 100 ng of nucleic acid, the genetic variation is a variation in the number of copy / heterozygosity and the detection is carried out with subchromosomal resolution; for example, resolution of at least 100 megabases, resolution of at least 10 megabases, resolution of at least 1 megabase, resolution of at least 100 kilobases, resolution of at least 10 kilobases or resolution of at least 1 kilobase. In some embodiments, the method comprises providing a plurality of sets of labeled parent polynucleotides, where each set is mappable to a different mappable position in a reference sequence. In some embodiments, the mappable position in the reference sequence is the locus of a tumor marker and the analysis comprises detecting the tumor marker in the consensus sequence set.
[123] In some embodiments, the tumor marker is present in the consensus set of sequences at a frequency lower than the error rate introduced in the amplification step. In some embodiments, the at least one set is a plurality of sets and the position
Petition 870160049132, of 9/5/2016, p. 53/177
51/159 mappable of the reference sequence comprises a plurality of mappable positions in the reference sequence, each mappable position being the locus of a tumor marker. In some modalities, the analysis comprises detecting the variation in the number of copies of the consensus sequences between at least two sets of parent polynucleotides. In some embodiments, the analysis comprises detecting the presence of sequence variations compared to the reference sequences.
[124] In some embodiments, the analysis comprises detecting the presence of sequence variations compared to the reference sequences and detecting the variation in copy number of the consensus sequences between at least two sets of parent polynucleotides. In some modalities, the collection comprises: (i) grouping the sequenced sequence readings from the progeny polynucleotides amplified in families, with each family being amplified from the same labeled parent polynucleotide; and (ii) determine a consensus sequence based on the sequence readings in a family.
[125] The disclosure also provides a system that comprises a computer-readable medium for performing the following steps: a. accept at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequence a subset (including an appropriate subset) of the
Petition 870160049132, of 9/5/2016, p. 54/177
52/159 set of amplified progeny polynucleotides, to produce a set of sequencing readings; d. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide from the set of labeled parent polynucleotides and, optionally, e) analyze the set of consensus sequences for each set of progenitor molecules tagged.
[126] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least 10% of the individual's genome is sequenced.
[127] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least 20% of the individual's genome is sequenced.
[128] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 30% of the individual's genome is sequenced.
[129] The disclosure also provides a method that comprises detecting the presence or absence of genetic change or amount of genetic variation in a
Petition 870160049132, of 9/5/2016, p. 55/177
53/159 individual, in which the detection is performed with the aid of the sequencing of cell-free nucleic acid, in which at least 40% of the individual's genome is sequenced.
[130] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 50% of the individual's genome is sequenced.
[131] The disclosure also provides a method which comprises detecting the presence or absence of genetic alteration or amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least 60% of the individual's genome is sequenced.
[132] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least 70% of the individual's genome is sequenced.
[133] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration or the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 80% of the individual's genome is sequenced.
[134] The disclosure also provides a method that comprises detecting the presence or absence of genetic change or the amount of genetic variation in a
Petition 870160049132, of 9/5/2016, p. 56/177
54/159 individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least 90% of the individual's genome is sequenced.
[135] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 10% of the individual's genome is sequenced.
[136] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 20% of the individual's genome is sequenced.
[137] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 30% of the individual's genome is sequenced.
[138] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 40% of the individual's genome is sequenced.
[139] The disclosure also provides a method that comprises detecting the presence or absence of genetic change and the amount of genetic variation in a
Petition 870160049132, of 9/5/2016, p. 57/177
55/159 individual, in which the detection is performed with the aid of the sequencing of cell-free nucleic acid, in which at least 50% of the individual's genome is sequenced.
[140] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of cell-free nucleic acid sequencing, in which at least at least 60% of the individual's genome is sequenced.
[141] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 70% of the individual's genome is sequenced.
[142] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 80% of the individual's genome is sequenced.
[143] The disclosure also provides a method that comprises detecting the presence or absence of genetic alteration and the amount of genetic variation in an individual, in which the detection is carried out with the aid of the sequencing of cell-free nucleic acid, in which at least at least 90% of the individual's genome is sequenced.
[144] In some embodiments, the genetic alteration is the variation in copy number or one or more rare mutations. In some modalities, genetic variation comprises a
Petition 870160049132, of 9/5/2016, p. 58/177
56/159 or more random variants and one or more polymorphisms. In some embodiments, the genetic change and / or the amount of genetic variation in the individual can be compared to a genetic change and / or the amount of genetic variation in one or more individuals with a known disease. In some embodiments, the genetic change and / or the amount of genetic variation in the individual can be compared to a genetic change and / or the amount of genetic variation in one or more individuals without a disease. In some embodiments, the cell-free nucleic acid is DNA. In some embodiments, the cell-free nucleic acid is RNA. In some embodiments, the cell-free nucleic acid is DNA and RNA. In some modalities, the disease is
cancer or pre-cancer. In some modalities, the method understands additionally the diagnosis or treatment in a disease. [145] A revelation also provides one method what comprises: The. to provide at least one set in
labeled parent polynucleotides and for each set of labeled parent polynucleotides; B. amplifying the labeled parent polynucleotides in the pool to produce a corresponding set of amplified progeny polynucleotides; ç. sequencing a subset (including an appropriate subset) of the set of amplified progeny polynucleotides to produce a set of sequencing readings; d. collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of polynucleotides
Petition 870160049132, of 9/5/2016, p. 59/177
57/159 labeled parents; and e) filter among the consensus strings those that fail to reach a quality limit.
[146] In some embodiments, the quality limit considers a number of sequence readings from the amplified progeny polynucleotides collected in a consensus sequence. In some modalities, the quality limit considers a number of sequence readings of the amplified progeny polynucleotides collected in a consensus sequence.
[147] The disclosure also provides a system that comprises a computer-readable means for carrying out the methods described in this document.
[148] The revelation also provides a method that comprises: a. provide at least one set of labeled parent polynucleotides, each set mapping to a different mappable position in a reference sequence in one or more genomes and for each set of labeled parent polynucleotides; i) amplify the first polynucleotides to produce a set of amplified polynucleotides; ii) sequencing a subset of the amplified polynucleotide set, to produce a set of sequencing readings; and iii) collecting the sequence readings: (i) grouping the sequenced sequence readings from the progeny polynucleotides amplified into families, with each family being amplified from the same labeled parent polynucleotide.
[149] In some modalities, collection additionally comprises determining a quantitative measurement of
Petition 870160049132, of 9/5/2016, p. 60/177
58/159 sequence readings in each family. In some modalities, the method additionally comprises: a) determining a quantitative measurement of exclusive families; and b) based on (1) the quantitative measurement of the exclusive families and (2) the quantitative measurement of the sequence readings in each group, infer a measurement of the unique labeled parent polynucleotides in the set. In some modalities, the inference is made using statistical and probabilistic models. In some embodiments, the at least one set is a plurality of sets. In some modalities, the method additionally comprises correcting representational or amplification biases between the two sets. In some embodiments, the method additionally comprises using a control or set of control samples to correct representational or amplification biases between the two sets. In some embodiments, the method additionally comprises determining the variation in the number of copies between the sets.
[150] In some modalities, the method additionally comprises: d) determining a quantitative measurement of polymorphic forms among families; and e) based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of the polymorphic forms in the number of inferred unique labeled parent polynucleotides. In some embodiments, polymorphic forms include, but are not limited to: substitutions, insertions, deletions, inversions, microsatellite changes, transversions, translocations, fusions, methylation, hypermethylation, hydroxymethylation, acetylation, variant
Petition 870160049132, of 9/5/2016, p. 61/177
59/159 epigenetics, associated regulatory variants or protein binding sites.
[151] In some embodiments, the sets are derived from a common sample and the method further comprises: d) inferring the variation in the number of copies for the plurality of sets based on a comparison of the inferred number of labeled parent polynucleotides in each set mapping each one from a plurality of mappable positions in a reference sequence. In some embodiments, the original number of polynucleotides in each set is additionally inferred. In some embodiments, at least a subset of the parent polynucleotides tagged in each set are tagged not exclusively.
[152] The disclosure also provides a method for determining copy number variation in a sample that includes polynucleotides, the method comprising: a) providing at least two sets of first polynucleotides, with each set mapping to a mappable position different in a reference sequence in a genome and for each set of first polynucleotides; (i) amplifying the polynucleotides to produce a set of amplified polynucleotides; (ii) sequencing a subset of the amplified polynucleotide set to produce a set of sequencing readings;
(iii) group the sequence readings sequenced from the polynucleotides amplified in families, with each family being amplified from the same first polynucleotide in the set; (iv) infer a quantitative measurement of families in the group; and b) determine the
Petition 870160049132, of 9/5/2016, p. 62/177
60/159 variation in the number of copies comparing the quantitative measurement of families in each set.
[153] The disclosure also provides a method for inferring the frequency of sequence calls in a polynucleotide sample comprising: a) providing at least one set of first polynucleotides, where each set maps to a different mappable position in a sequence of reference in one or more genomes and for a set of first polynucleotides; (i) amplifying the first polynucleotides to produce a set of amplified polynucleotides; (ii) sequencing a subset of the amplified polynucleotide set to produce a set of sequencing readings; (iii) group the sequence readings into families, with each family comprising the sequence readings of polynucleotides amplified from the same first polynucleotide; b) infer, for each set of first polynucleotides, a call frequency for one or more bases in the set of first polynucleotides, in which the inference comprises: (i) designating, for each family, the confidence score for each one among one plurality of calls, with the confidence score taking into account a frequency of the call among family members; and (ii) estimate the frequency of one or more calls that take into account the confidence scores of the one or more calls assigned to each family.
[154] The disclosure also provides a method for communicating sequence information about at least one individual polynucleotide molecule comprising:
Petition 870160049132, of 9/5/2016, p. 63/177
61/159 providing at least one individual polynucleotide molecule; b) encoding the sequence information on at least one individual polynucleotide molecule to produce a signal; c) passing at least part of the signal through a channel to produce a received signal comprising nucleotide sequence information about at least one individual polynucleotide molecule, wherein the received signal comprises noise and / or distortion; d) decoding the received signal to produce a message that comprises sequence information about at least one individual polynucleotide molecule, wherein the decoding reduces noise and / or distortion over each individual polynucleotide in the message; and e) providing the message comprising the sequence information about the at least one individual polynucleotide molecule to a recipient.
[155] In some embodiments, the noise comprises incorrect nucleotide calls. In some embodiments, the distortion comprises the uneven amplification of the individual polynucleotide molecule compared to other individual polynucleotide molecules. In some modalities, the distortion results from propensities for amplification or sequencing. In some embodiments, the at least one individual polynucleotide molecule is a plurality of individual polynucleotide molecules and the decoding produces a message about each molecule in the plurality. In some embodiments, encoding comprises amplifying at least one individual polynucleotide molecule, which has been optionally labeled, where the signal comprises a collection of
Petition 870160049132, of 9/5/2016, p. 64/177
62/159 amplified molecules. In some embodiments, the channel comprises a polynucleotide sequencer and the received signal comprises sequence readings from a plurality of polynucleotides and amplified from at least one individual polynucleotide molecule. In some embodiments, decoding comprises grouping the sequence readings of the amplified molecules from each of at least one individual polynucleotide molecule. In some embodiments, decoding consists of a probabilistic or statistical method of filtering the generated sequence signal.
[156] In some embodiments, polynucleotides are derived from tumor genomic DNA or RNA. In some embodiments, polynucleotides are derived from cell-free polynucleotides, exosomal polynucleotides, bacterial polynucleotides or viral polynucleotides. In some embodiments of any of the methods in this document, the method further comprises the detection and / or association of affected molecular pathways. In some embodiments of any of the methods in this document, the method additionally comprises serial monitoring of an individual's health or illness status. In some embodiments, the phylogeny of a genome associated with a disease within an individual is inferred. In some embodiments, any of the methods described in this document further comprises the diagnosis, monitoring or treatment of a disease. In some modalities, the treatment regimen is selected or modified based on the polymorphic forms detected or CNVs or associated trajectories.
Petition 870160049132, of 9/5/2016, p. 65/177
63/159
In some embodiments, the treatment comprises a combination therapy. In some modalities, the diagnosis additionally comprises locating the disease using a set of radiographic procedures, such as computed tomography, PET-CT, MRI, Ultrasound, Ultrasound with microbubbles, etc.
[157] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: selecting the predefined regions in a genome; access sequence readings and list the number of sequence readings in the predefined regions; normalize the number of sequence readings across the predefined regions; and determine the percentage of copy number variation in the predefined regions.
[158] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: accessing a data file comprising a plurality of sequencing readings; filter readings that fail to reach a defined quality limit; map the sequence readings derived from the sequencing into a reference sequence; identify a subset of mapped sequence readings that align with a reference sequence variant at each mappable base position; for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to
Petition 870160049132, of 9/5/2016, p. 66/177
64/159 reference sequence and (b) a number of total sequence readings for each base mappable position; normalize the ratios or frequency of variance for each mappable base position and determine each of the regions with potential rare variant (s) or other genetic alteration (s); and compare the resulting number for each of the regions with rare potential variant (s) or mutation (s) to numbers similarly derived from a reference sample.
[159] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, wherein the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides; and b) collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides.
[160] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, where the sequence readings are derived from a set of
Petition 870160049132, of 9/5/2016, p. 67/177
65/159 progeny polynucleotides amplified from at least one set of labeled parent polynucleotides; b) collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides; and c) filtering consensus strings from those that fail to reach a quality limit.
[161] A computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, wherein the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides; and i) collect the sequence readings: (1) grouping the sequenced sequence readings from the progeny polynucleotides amplified into families, with each family being amplified from the same labeled parent polynucleotide and, optionally, (2) determine a quantitative measurement of sequence readings in each family.
[162] In some embodiments, the executable code, upon execution by a computer processor, additionally performs the steps of: b) determining a quantitative measurement of the exclusive families; and c) based on (1) the quantitative measurement of exclusive families and (2) the quantitative measurement of the sequence readings in each
Petition 870160049132, of 9/5/2016, p. 68/177
66/159 group, infer a measurement of the parent labeled polynucleotides exclusive to the set.
[163] In some modalities, the executable code, upon execution by a computer processor, additionally performs the steps of: d) determining a quantitative measurement of polymorphic forms among families; and e) based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of the polymorphic forms in the number of inferred unique labeled parent polynucleotides.
[164] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, where the sequence readings are derived from a set of parent polynucleotides amplified from at least one set of labeled parent polynucleotides grouping the sequenced sequence readings from the amplified polynucleotides into families, each of which family is amplified from the same first polynucleotide in the set; b) infer a quantitative measurement of families in the group; and c) determine the variation in the number of copies by comparing the quantitative measurement of families in each set.
[165] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method being
Petition 870160049132, of 9/5/2016, p. 69/177
67/159 comprises: a) accessing a data file comprising a plurality of sequencing readings, in which the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of labeled parent polynucleotides grouping the readings from sequence in families, with each family comprising the sequence readings of polynucleotides amplified from the same first polynucleotide; b) infer, for each set of first polynucleotides, a call frequency for one or more bases in the set of first polynucleotides, in which the inference comprises: c) designate, for each family, the confidence score for each one among a plurality of calls, with the confidence score taking into account a frequency of the call among family members; and d) estimate the frequency of one or more calls that take into account the confidence scores of the one or more calls designated for each family.
[166] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a received signal comprising sequence information encoded from at least one individual polynucleotide molecule in which the received signal comprises noise and / or distortion; b) decode the received signal to produce a message comprising sequence information about at least one molecule of
Petition 870160049132, of 9/5/2016, p. 70/177
68/159 individual polynucleotide, in which decoding reduces noise and / or distortion over each individual polynucleotide in the message; and c) writing the message that comprises the sequence information about the at least one individual polynucleotide molecule in a computer file.
[167] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, wherein the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides; b) collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides; and c) filtering consensus strings from those that fail to reach a quality limit.
[168] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, wherein the sequence readings are derived from a set of progeny polynucleotides amplified from at least one set of parent polynucleotides
Petition 870160049132, of 9/5/2016, p. 71/177
69/159 tagged; and b) collecting the sequence readings: (i) grouping the sequenced sequence readings from the progeny polynucleotides amplified into families, with each family being amplified from the same labeled parent polynucleotide; and (ii) optionally, determine a quantitative measurement of the sequence readings in each family.
[169] In some embodiments, the executable code, upon execution by a computer processor, additionally performs the steps of: b) determining a quantitative measurement of the exclusive families; e) based on (1) the quantitative measurement of the exclusive families and (2) the quantitative measurement of the sequence readings in each group, infer a measurement of the exclusive labeled parent polynucleotides in the set.
[170] In some embodiments, the executable code, upon execution by a computer processor, additionally performs the steps of: e) determining a quantitative measurement of polymorphic forms among families; and f) based on the quantitative measurement of the polymorphic forms, infer a quantitative measurement of the polymorphic forms in the number of inferred unique labeled parent polynucleotides.
[171] In some embodiments, the executable code, upon execution by a computer processor, additionally performs the steps of: d) inferring the variation in the number of copies for the plurality of sets based on a comparison of the inferred number of polynucleotides parents labeled in each set mapping to each one among a plurality of reference sequences.
Petition 870160049132, of 9/5/2016, p. 72/177
70/159 [172] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a file data comprising a plurality of sequencing readings, wherein the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides; b) group the sequence readings sequenced from the polynucleotides amplified in families, with each family being amplified from the same first polynudeotide in the set; c) infer a quantitative measurement of families in the group; d) determine the variation in the number of copies by comparing the quantitative measurement of families in each set.
[173] The disclosure also provides a computer-readable medium that comprises non-transitory machine executable code that, upon execution by a computer processor, implements a method, the method comprising: a) accessing a data file that comprises a plurality of sequencing readings, where the sequence readings are derived from a set of amplified progeny polynucleotides from at least one set of labeled parent polynucleotides grouping the sequence readings into families, with each family comprising the sequence readings polynucleotides amplified from the same first polynudeotide; and infer, for each set of first polynucleotides, a call frequency for one or more
Petition 870160049132, of 9/5/2016, p. 73/177
71/159 more bases in the set of first polynucleotides, in which the inference comprises: (i) designating, for each family, the confidence score for each one among a plurality of calls, with the confidence score taking into account a frequency the call among other family members; and (ii) estimate the frequency of one or more calls that take into account the confidence scores of the one or more calls assigned to each family.
[174] The disclosure also provides a composition comprising between 100 and 100,000 human haploid genome equivalents of cfDNA polynucleotides, where the polynucleotides are tagged with between 2 and 1,000,000 unique identifiers.
[175] In some embodiments, the composition comprises between 1,000 and 50,000 haploid human genome equivalents of cfDNA polynucleotides, where the polynucleotides are tagged with between 2 and 1,000 unique identifiers. In some embodiments, the unique identifiers comprise nucleotide barcodes. The disclosure also provides a method comprising: a) providing a sample comprising between 100 and 100,000 equivalents of human haploid genome of cfDNA polynucleotides; and b) tag the polynucleotides with between 2 and 1,000,000 unique identifiers.
[176] The disclosure also provides a method comprising: a) providing a sample comprising a plurality of human haploid genome equivalents of the fragmented polynucleotides; b) determine z, where z is a measure of the central trend (for example, average,
Petition 870160049132, of 9/5/2016, p. 74/177
72/159 median or mode) of the expected number of duplicated polynucleotides starting at any position in the genome, where the duplicated polynucleotides have the same start and stop positions; and c) tag the polynucleotides in the sample with n unique identifiers, where n is between 2 and 100,000 * z, 2 and 10,000 * z, 2 and 1,000 * z or 2 and 100 * z. The disclosure also provides a method comprising: a) providing at least one set of labeled parent polynucleotides and for each set of labeled parent polynucleotides; b) producing a plurality of sequence readings for each parent polynucleotide tagged in the set to produce a set of sequencing readings; and c) collect the set of sequencing readings to generate a set of consensus sequences, with each consensus sequence corresponding to a unique polynucleotide among the set of labeled parent polynucleotides.
[177] The disclosure also provides a system that comprises a computer-readable medium that comprises machine-executable code as described in this document. The disclosure also provides a system comprising a computer-readable medium that comprises machine-executable code which, upon execution by a computer processor, implements a method as described in this document.
[178] The additional aspects and advantages of the present disclosure will become readily apparent to those skilled in the art from the detailed description below, in which only illustrative modalities of the present
Petition 870160049132, of 9/5/2016, p. 75/177
73/159 revelation are shown and described. As will be seen, the present revelation has the capacity for other different modalities and its various details have the capacity for modifications in several obvious aspects, all without departing from the revelation. Consequently, the drawings and description should be considered as illustrative in nature and not as restrictive.
INCORPORATION BY REFERENCE [179] All publications, patents and patent applications mentioned in this specification are incorporated into this document for reference to the same extent, as if each individual publication, patent or patent application had been specifically and individually indicated to be incorporated as a reference.
BRIEF DESCRIPTION OF THE DRAWINGS [180] The innovative features of a system and methods of this disclosure are set out particularly in the appended claims. A better understanding of the features and advantages of this disclosure will be obtained by reference to the detailed description below that establishes the illustrative modalities, in which the principles of a system and method of this disclosure are used and the attached drawings of which:
Figure 1 is a flowchart representation of a method of detecting the variation in the number of copies using a single sample;
Figure 2 is a flowchart representation of a method of detecting copy number variation using paired samples;
Petition 870160049132, of 9/5/2016, p. 76/177
74/159 Figure 3 is a flowchart representation of a method for detecting rare mutations (for example, single nucleotide variants);
Figure 4A is a report of detection of variation in the number of graphic copies of a normal non-cancerous individual;
Figure 4B is a report of detection of variation in the number of graphic copies from an individual with
cancer in prostate; The Figure 4C is a representation schematic of access The Internet enabled from generated reports The leave gives analyze number variation copy of one
individual with prostate cancer;
Figure 5A is a report of detection of variation in the number of graphic copies generated from an individual in remission of prostate cancer;
Figure 5B is a report of detection of variation in the number of graphic copies generated from an individual with recurrent prostate cancer;
Figure 6A is a graphical detection report (for example, for single nucleotide variants) generated from various mixing experiments using DNA samples that contain both wild-type and mutant copies of MET and TP53;
Figure 6B is a logarithmic graphical representation of the detection results (for example, single nucleotide variant); the percent cancer measurements observed vs. Expected results are shown for various mixing experiments using DNA samples that contain both wild-type and mutant copies of
Petition 870160049132, of 9/5/2016, p. 77/177
75/159
MET, HRAS and TP53;
Figure 7A is a graphical report of the percentage of two (for example, single nucleotide variants) in two genes, PIK3CA and TP53, in an individual with prostate cancer compared to a reference (control);
Figure 7B is a schematic representation of the Internet access enabled from the reports generated from the analysis (for example, single nucleotide variant) of an individual with prostate cancer;
Figure 8 is a flowchart representation of a method for analyzing genetic material;
Figure 9 is a flowchart representation of a method for decoding the information in a set of sequence readings to produce, with reduced noise and / or distortion, a representation of the information in a set of labeled parent polynucleotides;
Figure 10 is a flowchart representation of a method for reducing distortion in determining CNV from a set of sequence readings;
Figure 11 is a flowchart representation of a method for estimating the frequency of a base or sequence of bases at a locus in a labeled parent polynucleotide population from a set of sequence readings;
Figure 12 shows a method for communicating the sequence information;
Figure 13 shows the lowest allele frequencies detected through an entire 70 kb panel at 0.3% LNCaP cfDNA titration using Digital Sequencing and standard sequencing workflows; O
Petition 870160049132, of 9/5/2016, p. 78/177
76/159 standard analog sequencing (Figure 13 A) masks all true positive rare variants in significant noise due to PCR and sequencing errors despite Q30 filtering; Digital Sequencing (Figure 13B) eliminates all PCR and sequencing noise, revealing the true mutations without any false positives: the green circles are SNP points in normal cfDNA and the red circles are detected LNCaP mutations;
Figure 14 shows the LNCap cfDNA titration;
Figure 15 shows a computer system that is programmed or otherwise configured to implement various methods of the present disclosure.
DETAILED DESCRIPTION OF THE INVENTION
I. Overview [181] The present disclosure provides a system and method for detecting rare mutations (eg, single or multiple nucleotide variations) and copy number variation in cell-free polynucleotides. Generally, systems and methods comprise sample preparation or the extraction and isolation of cell-free polynucleotide sequences from a body fluid; subsequent sequencing of cell-free polynucleotides by sets of procedures known in the art; and application of bioinformatics tools to detect rare mutations and variations in copy number compared to a reference. The systems and methods also contain a database or collection of different rare mutations or profiles of variation in copy number of different diseases to be used as additional references to assist in the detection of rare mutations (eg
Petition 870160049132, of 9/5/2016, p. 79/177
77/159 example, single nucleotide variation profile definition), copy number variation profile definition or general disease profile definition.
[182] Systems and methods can be particularly useful in the analysis of cell-free DNA. In some cases, cell-free DNA is extracted and isolated from a readily accessible body fluid such as blood. For example, cell-free DNA can be extracted using a variety of methods known in the art, including, but not limited to, isopropanol precipitation and / or silica-based purification. Cell-free DNA can be extracted from any number of individuals, such as individuals without cancer, individuals at risk of cancer or individuals who have cancer (for example, by other means).
[183] Following the isolation / extraction step, any one of several different sequencing operations can be performed on the cell-free polynucleotide sample. Samples can be processed before sequencing with one or more reagents (for example, enzymes, unique identifiers (for example, bar codes), probes, etc.). In some cases, if the sample is processed with a unique identifier such as a barcode, samples or sample fragments can be labeled individually or in subgroups with the unique identifier. The labeled sample can then be used in a downstream application such as a sequencing reaction by which individual molecules can be traced to parent molecules.
[184] After sequence sequencing data
Petition 870160049132, of 9/5/2016, p. 80/177
When cell-free polynucleotides are collected, one or more bioinformatics processes can be applied to sequence data to detect genetic resources or aberrations such as copy number variation, rare mutations (for example, single or multiple nucleotide variations) ) or changes in epigenetic markers, including, but not limited to, methylation profiles. In some cases where copy number variation analysis is desired, sequence data can be: 1) aligned with a reference genome; 2) filtered and mapped; 3) divided into intervals or gaps in the sequence; 4) coverage readings counted for each interval; 5) the coverage readings can then be normalized using a stochastic or statistical modeling algorithm; 6) and an output file can be generated reflecting the different copy number states at various positions in the genome. In other cases where rare mutation analysis is desired, the sequence data can be 1) aligned with a reference genome; 2) filtered and mapped; 3) frequency of variant bases calculated based on the coverage readings for that specific base; 4) base frequency of standardized variant using a stochastic, statistical or probabilistic modeling algorithm; 5) and an output file can be generated reflecting the mutation states at various positions in the genome.
[185] A variety of different reactions and / or operations can occur within the systems and methods disclosed in this document, including, but not limited to: nucleic acid sequencing, quantification
Petition 870160049132, of 9/5/2016, p. 81/177
79/159 nucleic acid, sequencing optimization, detecting gene expression, quantifying gene expression, genomic profile definition, cancer profile definition or expressed marker analysis. In addition, systems and methods have numerous medical applications. For example, they can be used for the identification, detection, diagnosis, treatment, staging or risk prediction of various genetic or non-genetic diseases including cancer. They can be used to assess the individual's response to different treatments for said genetic or non-genetic diseases or to provide information regarding the prognosis and progression of the disease.
[186] Polynucleotide sequencing can be compared to a problem in communication theory. An initial individual polynucleotide or group of polynucleotides is considered to be an original message. Labeling and / or amplification can be considered as encoding the original message into a signal. The sequencing can be considered as a communication channel. The output of a sequencer, for example, the sequence readings, can be considered a received signal. The bioinformatics processing can be considered a recipient that decodes the received signal to produce a transmitted message, for example, a nucleotide sequence or sequences. The received signal may include artifacts, such as noise and distortion. The noise can be considered an unwanted random addition to a signal. The distortion can be considered a change in the amplitude of a signal or portion of a signal.
[187] The noise can be introduced through errors in the
Petition 870160049132, of 9/5/2016, p. 82/177
80/159 copying and / or reading a polynucleotide. For example, in a sequencing process, a single polynucleotide can first be subjected to amplification. Amplification can introduce errors, so that a subset of the amplified polynucleotides can contain, at a particular locus, a base that is not the same as the original base at that locus. Furthermore, in the reading process, a basis in any particular locus may be read incorrectly. As a consequence, the collection of sequence readings may include a certain percentage of base calls in a locus that is not the same as the original base. In typical sequencing technologies, this error rate can be in the single digit, for example, 2% to 3%. When a collection of molecules that are assumed to have all the same sequence are sequenced, this noise is small enough that the original base can be identified with high reliability.
[188] However, if a collection of parent polynucleotides includes a subset of polynucleotides that have sequence variants at a particular locus, noise can be a significant problem. This may be the case, for example, when cell-free DNA includes not only germline DNA, but DNA from another source, such as fetal DNA or cancer cell DNA. In that case, if the frequency of the molecules with sequence variants is in the same range as the frequency of errors introduced by the sequencing process, then the true sequence variants may not be distinguishable from noise. This could interfere, for example, in the detection of sequence variants in a
Petition 870160049132, of 9/5/2016, p. 83/177
81/159 sample.
[189] The distortion can be manifested in the sequencing process as a difference in signal strength, for example, the total number of sequence readings, produced by the molecules in a parent population at the same frequency. The distortion can be introduced, for example, through the amplification propensity, GC propensity or sequencing propensity. This could interfere with detecting the variation in the number of copies in a sample. The propensity of GC in the unequal representation of areas rich or poor in the content of GC in sequence reading.
[190] This invention provides methods for reducing sequencing artifacts, such as noise and / or distortion, in a polynucleotide sequencing process. Grouping sequence readings into families derived from original individual molecules can reduce noise and / or distortion from a single individual molecule or a group of molecules. For a single molecule, grouping readings in one family reduces distortion, for example, indicating that many sequence readings actually represent a single molecule rather than many different molecules. Collecting sequence readings in a consensus sequence is a way to reduce noise in the message received from a molecule. Using probabilistic functions that convert received frequencies is another way. In relation to a group of molecules, grouping readings into families and determining a quantitative measurement of families reduces the distortion, for example, in the quantity of molecules in each of a plurality of different loci. Again, the withdrawal
Petition 870160049132, of 9/5/2016, p. 84/177
82/159 of the sequence readings of different families in consensus sequences eliminates the errors introduced by the amplification and / or sequencing error. In addition, determining base call frequencies based on the probabilities derived from family information also reduces noise in the message received from a group of molecules.
[191] Methods for reducing noise and / or distortion in a sequencing process are known. They include, for example, filtering the strings, for example, requiring them to reach a quality limit or reducing the propensity for GC. Such methods are typically performed in the collection of sequence readings that are the output of a sequencer and can be performed sequence-by-sequence reading, without considering the family structure (sub-collections of sequences derived from a single original parent molecule) . Certain methods of this invention reduce noise and distortion by reducing noise and / or distortion within families of sequence readings, i.e., operating on sequence readings grouped into families derived from a single parent polynucleotide molecule. Signal artifact reduction at the family level can significantly reduce less noise and distortion in the last message that is provided than artifact reduction performed at a sequence-by-reading reading level or at the sequencer output as a whole.
[192] The present disclosure additionally provides methods and systems for detecting the genetic variation of high sensitivity in a sample of genetic material
Petition 870160049132, of 9/5/2016, p. 85/177
83/159 initial. The methods involve using one or both of the following tools: first, the effective conversion of individual polynucleotides in a sample of the initial genetic material into ready-sequence labeled parent polynucleotides, in order to increase the likelihood that individual polynucleotides in a sample of initial genetic material are represented in a sample of ready sequence. This can produce the sequence information about more polynucleotides in the initial sample. Second, the high-throughput generation of consensus sequences for labeled parent polynucleotides by sampling high-rate progeny polynucleotides amplified from the labeled parent polynucleotides and the collection of sequence readings generated in consensus sequences that represent sequences of labeled parent polynucleotides. This can reduce the noise introduced by the propensity for amplification and / or sequencing errors and can increase the sensitivity of the detection. The collection is performed in a plurality of sequence readings, generated either from readings of the amplified molecules or from multiple readings of a single molecule.
[193] Sequencing methods typically involve sample preparation, sequencing of polynucleotides in the sample prepared to produce the sequence readings and bioinformatics manipulation of the sequence readings to produce quantitative and / or qualitative genetic information about the sample. Sample preparation involves converting polynucleotides in a sample into a form compatible with the sequencing platform used.
Petition 870160049132, of 9/5/2016, p. 86/177
84/159
This conversion may involve tagging polynucleotides. In certain embodiments of this invention, the tags comprise polynucleotide sequence tags. The conversion methodologies used in the sequencing may not be 100% effective. For example, it is not uncommon to convert the polynucleotides in a sample with a conversion efficiency of about 1 to 5%, that is, about 1 to 5% of the polynucleotides in a sample are converted to labeled polynucleotides. Polynucleotides that are not converted to labeled molecules are not represented in a labeled library for sequencing. Consequently, polynucleotides that have genetic variants represented at a low frequency in the starting genetic material may not be represented in the labeled library and, therefore, may not be sequenced or detected. By increasing the conversion efficiency, the probability that the rare polynucleotide in the initial genetic material will be represented in the labeled library and, consequently, detected by sequencing is increased. In addition, instead of directly addressing the problem of poor conversion efficiency of library preparation, most protocols to date require more than 1 microgram of DNA as input material. However, when the input sample material is limited or the detection of polynucleotides with low representation is desired, high conversion efficiency can effectively sequence the sample and / or properly detect such polynucleotides.
[194] This revelation provides methods for converting the
Petition 870160049132, of 9/5/2016, p. 87/177
85/159 initial polynucleotides into labeled polynucleotides with a conversion efficiency of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 80% or at least minus 90%. The methods involve, for example, using any blunt end bond, sticky end bond, molecular inversion probes, PCR, bond based PCR, multiplex PCR, single filament bond and single filament circularization. The methods may also involve limiting the amount of initial genetic material. For example, the amount of starting genetic material can be less than 1 µg, less than 100 ng or less than 10 ng. These methods are described in more detail in this document.
[195] Obtaining accurate quantitative and qualitative information about polynucleotides in a labeled library can result in a more sensitive characterization of the initial genetic material. Typically, polynucleotides in a labeled library are amplified and the resulting amplified molecules are sequenced. Depending on the performance of the sequencing platform used, only a subset of the molecules in the amplified library produce the sequence readings. So, for example, the number of amplified molecules for sequencing can be only about 50% of the unique polynucleotides in the labeled library. In addition, amplification may be prone to or against certain sequences or certain members of the labeled library. This can distort the quantitative measurement of the sequences in the labeled library.
Petition 870160049132, of 9/5/2016, p. 88/177
86/159
Also, sequencing platforms can introduce errors in sequencing. For example, strings can have a base error rate of 0.5 to 1%. Sequencing errors and amplification propensity introduce noise into the final sequencing product. This noise can decrease the sensitivity of the detection. For example, sequence variants whose frequency in the tagged population is less than the rate of sequencing error can be mistaken for noise. Also, by providing the readings of the strings in quantities greater or less than their actual number in a population, the propensity for amplification can distort the measurements of the copy number variation. Alternatively, a plurality of sequence readings from a single polynucleotide can be produced without amplification. This can be done, for example, with nanopore methods.
[196] This disclosure provides methods for precisely detecting and reading unique polynucleotides in a labeled array. In certain embodiments, this disclosure provides labeled sequence polynucleotides that, when amplified and sequenced or when sequenced a plurality of times to produce a plurality of sequence readings, provide information that enabled the tracking or retrieval of the progeny polynucleotides for the parent polynucleotide molecule exclusive label. Collecting the families of amplified progeny polynucleotides reduces the propensity for amplification by providing information about the original unique parent molecules. Collapse also reduces sequencing errors
Petition 870160049132, of 9/5/2016, p. 89/177
87/159 by eliminating mutant sequences of progeny molecules from the sequencing data.
[197] Detecting and reading unique polynucleotides in the labeled library may involve two strategies. In one strategy, a sufficiently large subset of the amplified progeny polynucleotide cluster is sequenced so that for a large percentage of unique labeled parent polynucleotides in the labeled parent polynucleotide set, there is a sequence reading that is produced for at least one progeny polynucleotide amplified in a family produced from a unique labeled parent polynucleotide. In a second strategy, the amplified progeny polynucleotide set is sampled for sequencing at one level to produce sequence readings from multiple progeny members of a family derived from a unique parent polynucleotide. The generation of sequence readings from multiple progeny members of a family allows the collection of sequences in consensus progenitor sequences.
[198] So, for example, the sampling in one number in polynucleotides progeny amplified of set in polynucleotides progeny amplified what is equal to
number of unique labeled parent polynucleotides in the set of labeled parent polynucleotides (particularly when the number is at least 10,000) will statistically produce a sequence reading for at least one of the progeny of about 68% of the labeled parent polynucleotides in the set and
Petition 870160049132, of 9/5/2016, p. 90/177
88/159 about 40% of the parent polynucleotides labeled unique in the original set will be represented by at least two progeny sequence readings. In certain modalities, the set of amplified progeny polynucleotides is sampled sufficiently to produce an average of five to ten sequence readings for each family. Sampling the amplified progeny set of 10 times as many molecules as the number of unique labeled parent polynucleotides will statistically produce sequence information about 99.995% of families, of which 99.95% of total families will be covered by a plurality of readings from sequence. A consensus sequence can be constructed from the progeny polynucleotides in each family, in order to dramatically reduce the error rate of the nominal base sequencing error rate to a rate possibly many orders of magnitude lower. For example, if the sequencer has an error rate of 1% per base and the chosen family has 10 readings, a consensus sequence built from those 10 readings would have an error rate below 0.0001%. Consequently, the sample size of the amplified progeny to be sequenced can be chosen to ensure that a sequence that has a frequency in the sample that is not greater than the sequencing error rate per nominal base for a sequencing platform rate used has at least a 99% chance of being represented by at least one reading.
[199] In another embodiment, the set of amplified progeny polynucleotides is sampled at a
Petition 870160049132, of 9/5/2016, p. 91/177
89/159 level to produce a high probability, for example, at least 90%, that a sequence represented in the set of parent polynucleotides tagged at a frequency that is approximately equal to the rate of sequencing error based on the sequencing platform used is covered by at least one sequence reading and, preferably, a plurality of sequence readings. So, for example, if the sequencing platform has a base error rate of 0.2% in a sequence or set of sequences it is represented in the set of parent polynucleotides tagged at a frequency of about 0.2%, then the number of polynucleotides in the amplified progeny cluster that is sequenced can be about X times the number of unique molecules in the labeled parent polynucleotide set.
[200] These methods can be combined with any of the described noise reduction methods. Including, for example, qualifying the sequence readings for inclusion in the grouping of the sequences used to generate the consensus sequences.
[201] This information can now be used for both qualitative and quantitative analysis. For example, for quantitative analysis, a measurement, for example, a count, of the amount of labeled parent molecules mapping to a reference sequence is determined. This measurement can be compared to a measurement of labeled parent molecules mapping a different genomic region. That is, the amount of labeled parent molecules mapping to a first
Petition 870160049132, of 9/5/2016, p. 92/177
90/159 mappable location or position in a reference sequence, such as the human genome, can be compared to a measurement of labeled progenitor molecules mapping a second mappable location or position in a reference sequence. This comparison can reveal, for example, the relative amounts of progenitor molecules that map to each region. This, in turn, provides an indication of the variation in copy number for molecules that map to a particular region. For example, if the measurement of polynucleotides that map to a first reference sequence is greater than the measurement of polynucleotides that map to a second reference sequence, this may indicate that the parent population and, by extension, the original sample included the polynucleotides of cells that exhibit aneuploidy. Measurements can be normalized against a control sample to eliminate multiple biases. Quantitative measurements can include, for example, number, count, frequency (or relative, inferred or absolute).
[202] A reference genome can include the genome of any species of interest. Human genome sequences useful as references may include the hgl9 assembly or any previous or available hg assembly. Such strings can be determined using the genome browser available at genoma.ucsc.edu/index.html. Other species of genomes include, for example, PanTro2 (chimpanzee) and mm9 (mouse).
[203] For qualitative analysis, the sequences of a set of labeled polynucleotides that map to a reference sequence can be analyzed for
Petition 870160049132, of 9/5/2016, p. 93/177
91/159 variant sequences and their frequency in the population of labeled parent polynucleotides can be measured.
II. Sample Preparation
A. Polynucleotide Isolation and Extraction [204] The systems and methods of this disclosure can have a wide variety of uses in the manipulation, preparation, identification and / or quantification of cell-free polynucleotides. Examples of polynucleotides include, but are not limited to: DNA, RNA, amplicons, cDNA, dsDNA, ssDNA, plasmid DNA, cosmid DNA, high Molecular Weight (MW) DNA, chromosomal DNA, genomic DNA, viral DNA, bacterial DNA, mtDNA (mitochondrial DNA), mRNA, rRNA, tRNA, nRNA, siRNA, snRNA, snoRNA, scaRNA, microRNA, dsRNA, ribozyme, riboswitch and viral RNA (for example, retroviral RNA).
[205] Cell-free polynucleotides can be derived from a variety of sources, including human, mammal, non-human mammal, simian, monkey, chimpanzee, reptile, amphibian or bird sources. In addition, samples can be extracted from a variety of anal fluids containing free cell sequences, including, but not limited to, blood, serum, plasma, vitreous, sputum, urine, tears, perspiration, saliva, semen, mucosal excretions, mucus, spinal fluid, amniotic fluid, lymphatic fluid and the like. Cell-free polynucleotides may be of fetal origin (through the fluid taken from a pregnant individual), or they may be derived from the individual's own tissue.
[206] Isolation and extraction of cell-free polynucleotides can be accomplished by collecting
Petition 870160049132, of 9/5/2016, p. 94/177
92/159 bodily fluids using a variety of techniques. In some cases, the collection may include aspiration of an individual's body fluid using a syringe.
In other cases , the gathering can understand pipetting or direct collection of fluid in a container collect.[207] After the gathering of fluids bodily, the polynucleotides free of cells can be isolated and
extracted using a variety of techniques known in the field. In some cases, cell-free DNA can be isolated, extracted and prepared using commercially available kits, such as the Qiagen Qiamp® Circulating Nucleic Acid Kit protocol. In other examples, the Qiagen dsDNA HS Assay kit protocol
Qubit ™, the kit Agilent ™ DNA 1000 or the preparation in Library in Sequencing TruSeq ™; the protocol in Yield Low (LT) can be used.
[208] In general, cell-free polynucleotides are extracted and isolated from body fluids through a separation step in which cell-free DNAs, as found in solution, are separated from cells and other non-soluble components of body fluid. Separation may, however, include, without limitation, techniques such as centrifugation or filtration. In other cases, the cells are not separated from the cell-free DNA first, but are instead lysed. In this example, the genomic DNA of intact cells is separated by selective precipitation. Cell-free polynucleotides, including DNA, can remain soluble and can be separated from insoluble genomic DNA and extracted. In general, after adding buffers and other washing steps
Petition 870160049132, of 9/5/2016, p. 95/177
93/159 specific to different kits, DNA can be precipitated using isopropanol precipitation. Additional cleaning steps can be used, such as silica-based columns, to remove contaminants and salts. The general steps can be optimized for specific applications. Non-specific mass-bearing polynucleotides, for example, can be added throughout the reaction to optimize certain aspects of the procedure, such as throughput.
[209] Isolation and purification of cell-free DNA can be accomplished using any means, including, but not limited to, the use of commercial kits and protocols provided by companies such as Sigma Aldrich, Life Technologies, Promega , Affymetrix, IBI or similar. Kits and protocols may also not be commercially available.
[210] After isolation, in some cases, cell-free polynucleotides are pre-mixed with one or more additional materials, such as one or more reagents (eg ligase, protease, polymerase) before sequencing.
[211] One method to increase conversion efficiency involves using a ligase modified for optimal reactivity in single-stranded DNA, such as a derivative of ssDNA ligase ThermoPhage. Such ligases skip the traditional steps in the preparation of A and tail repair library which may have insufficient efficacies and / or accumulated losses due to intermediate cleaning steps and allow to double the probability that the sense or antisense starting polynucleotide is
Petition 870160049132, of 9/5/2016, p. 96/177
94/159 converted to an appropriately labeled polynucleotide. It also converts double-stranded polynucleotides that may have projections that may not be sufficiently blinded by the typical extremity repair reaction. The ideal reaction conditions for this ssDNA reaction are: 1 x reaction buffer (50 mM MOPS (pH 7.5), 1 mM DTT, 5 mM MgC12, 10 mM KC1). With 50 mM ATP, 25 mg / ml BSA, 2.5 mM MnC12, 200 pmol of 85 nt of ssDNA oligomer and 5 U of ssDNA ligase incubated at 65 ° C for 1 hour. Subsequent amplification using PCR can further convert the labeled single-stranded library into a double-stranded library and yield an overall conversion efficiency well over 20%. Other methods for increasing the conversion rate, for example, up to more than 10%, include, for example, any of the following, alone or in combination: annealing optimized molecular inversion probes, blind end connection with a size range well-controlled polynucleotide, cohesive end ligation or an initial multiplex amplification step with or without the use of fusion primers.
B. Molecular Bar Coding of Cell-Free Polynucleotides [212] The systems and methods of this disclosure can also enable cell-free polynucleotides to be tagged and traced to allow subsequent identification and origin of the particular polynucleotide. This feature is in contrast to other methods that use pooled or multiplex reactions and that only provide measurements or analyzes as an average of
Petition 870160049132, of 9/5/2016, p. 97/177
95/159 multiple samples. Here, assigning an identifier to the individual polynucleotide or subgroups of polynucleotides can allow a unique identity to be assigned to the individual sequences or sequence fragments. This can allow for the acquisition of data from individual samples and is not limited to sample averages.
[213] In some examples, nucleic acids or other molecules derived from a single strand may share a common tag or identifier and therefore can be identified later as being derived from that strand. Similarly, all fragments of a single nucleic acid strand can be tagged with the same identifier or tag, thereby allowing subsequent identification of fragments of the parent strand. In other cases, gene expression products (for example, mRNA) can be labeled in order to quantify the expression, by which the barcode, or the barcode in combination with the sequence to which it is attached, can be counted. In still other cases, the systems and methods can be used as a PCR amplification control. In such cases, multiple PCR reaction amplification products can be tagged with the same tag or identifier. If the products are subsequently sequenced and show sequence differences, the differences between the products with the same identifier can then be attributed to the PCR error.
[214] In addition, individual sequences can be identified based on data characteristics of
Petition 870160049132, of 9/5/2016, p. 98/177
96/159 sequence for the readings themselves. For example, the detection of unique sequence data in the start (start) and end (stop) portions of individual sequencing readings can be used, alone or in combination, with the length or number of base pairs of each unique sequence sequence reading to assign unique identities to individual molecules. Fragments of a single nucleic acid strand, which has been assigned a unique identity, may therefore allow subsequent identification of fragments of the parent strand. This can be used in conjunction with the restriction of the initial genetic material to limit diversity.
[215] In addition, the use of unique sequence data in the start (start) and end (stop) portions of individual sequencing readings and the sequencing reading length can be used, alone or in combination, with the use of codes of bars. In some cases, bar codes may be unique, as described in this document. In other cases, the bar codes themselves may not be exclusive. In this case, the use of non-exclusive bar codes, in combination with sequence data in the start (start) and end (stop) portions of individual sequencing readings and sequencing read length may allow the assignment of a unique identity to the individual strings. Similarly, fragments of a single nucleic acid strand, which has been assigned a unique identity, may therefore allow subsequent identification of fragments of the strand
Petition 870160049132, of 9/5/2016, p. 99/177
97/159 parent.
[216] In general, the methods and systems provided in this document are useful for the preparation of cell-free polynucleotide sequences for a downstream application sequencing reaction. Typically, a sequencing method is classic Sanger sequencing. Sequencing methods may include, but are not limited to: high throughput sequencing, pyro sequencing, synthesis sequencing, single molecule sequencing, nanoporous sequencing, semiconductor sequencing, ligation sequencing, hybridization sequencing, RNA-Seq (Illumina) , Digital Gene Expression (Helicos), next generation sequencing, Single molecule sequencing by Synthesis (SMSS) (Helicos), massively parallel sequencing, Single Clonal Molecule Arrangement (Solexa), trigger sequencing, Maxim-Gilbert sequencing, primer walking and any other sequencing methods known in the art.
C. Assigning Barcodes to Cell-Free Polynucleotide Sequences [217] The systems and methods disclosed in this document can be used in applications that involve the assignment of unique or non-unique identifiers, or molecular barcodes, to polynucleotides cell-free. Typically, the identifier is a barcode oligonucleotide that is used to tag the polynucleotide; but in some cases, different unique identifiers are used. For example, in some cases, the unique identifier is a
Petition 870160049132, of 9/5/2016, p. 100/177
98/159 hybridization probe. In other cases, the unique identifier is a dye, in which case the fixation may comprise intercalation of the dye into the analyte molecule (such as intercalation in DNA or RNA) or binding to a probe identified with the dye. In still other cases, the unique identifier may be a nucleic acid oligonucleotide, in which case the attachment to the polynucleotide sequences may comprise a binding reaction between the oligonucleotide and the sequences or incorporation via PCR. In other cases, the reaction may include the addition of a metal isotope, either directly to the analyte or through a probe identified with the isotope. In general, the assignment of unique or non-unique identifiers, or molecular barcodes in reactions of this disclosure may follow the methods and systems described, for example, by patent applications n-US 20010053519, 20030152490, 20110160078 and US patent 6,582. 908.
[218] Typically, the method comprises attaching oligonucleotide barcodes to nucleic acid analytes via an enzymatic reaction, including, but not limited to, a binding reaction. For example, the ligase enzyme can covalently attach a DNA barcode to fragmented DNA (for example, high molecular weight DNA). After setting the bar codes, the molecules can be subjected to a sequencing reaction.
[219] However, other reactions can be used as well. For example, oligonucleotide primers containing barcode sequences can be used in amplification reactions (eg, PCR, qPCR, reverse transcriptase PCR, digital PCR, etc.) of the analytes of
Petition 870160049132, of 9/5/2016, p. 101/177
99/159 model of DNA, thus producing labeled analytes. After assigning bar codes to the free cell polynucleotide sequences of individual cells, the grouping of molecules can be sequenced.
[220] In some cases, PCR can be used for global amplification of cell-free polynucleotide sequences. This may include using adapter sequences that can first be linked to different molecules followed by PCR amplification using universal primers. Sequencing PCR can be performed using any means, including, but not limited to, the use of commercial kits provided by Nugen (WGA kit), Life Technologies, Affymetrix, Promega, Qiagen and the like. In other cases, only certain target molecules within a population of cell-free polynucleotide molecules can be amplified. Specific primers can, in conjunction with adapter linkage, be used to selectively amplify certain targets for downstream sequencing.
[221] Unique identifiers (for example, oligonucleotide barcodes, antibodies, probes, etc.) can be introduced into the cell-free polynucleotide sequences in a random or non-random manner. In some cases, they are introduced in an expected ratio between unique identifiers and microwells. For example, unique identifiers can be loaded so that more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1,000, 5,000, 10,000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique identifiers
Petition 870160049132, of 9/5/2016, p. 102/177
100/159 are loaded per genome sample. In some cases, unique identifiers can be loaded so that less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1,000, 5,000, 10,000, 50,000 , 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique identifiers are loaded per genome sample. In some cases, the average number of unique identifiers loaded per genome sample is less than or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1,000, 5,000, 10,000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique identifiers are loaded per genome sample.
[222] In some cases, unique identifiers can be a variety of lengths, so that each bar code is at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 , 50, 100, 500, 1,000 base pairs. In other cases, bar codes may comprise less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1,000 base pairs.
[223] In some cases, unique identifiers may be predetermined or random or semi-random sequence oligonucleotides. In other cases, a plurality of barcodes may be used so that the barcodes are not necessarily exclusive to each other in the plurality. In this example, bar codes can be linked to individual molecules so that the combination of the bar code and the sequence to which it can be linked to create a unique sequence can be individually tracked. As described, the
Petition 870160049132, of 9/5/2016, p. 103/177
101/159 detection of non-unique bar codes in combination with sequence data from the start (start) and end (stop) portions of sequence readings can allow the assignment of a unique identity to a particular molecule. The length, or number of base pairs, of an individual sequence reading can also be used to assign a unique identity to that molecule. As described herein, fragments of a single nucleic acid strand, which has been assigned a unique identity, may therefore allow subsequent identification of fragments of the parent strand. In this way, the polynucleotides in the sample can be uniquely identified or substantially unique.
[224] Unique identifiers can be used to tag a wide range of analytes, including, but not limited to, RNA or DNA molecules. For example, unique identifiers (for example, barcode oligonucleotides) can be attached to entire strands of nucleic acids or to fragments of nucleic acids (for example, fragmented genomic DNA, fragmented RNA). Unique identifiers (for example, oligonucleotides) can also bind to gene expression products, genomic DNA, mitochondrial DNA, RNA, mRNA and the like.
[225] In many applications, it can be important to determine whether the individual cell-free polynucleotide sequences each receive a different unique identifier (for example, oligonucleotide barcode). If the population of identifiers
Petition 870160049132, of 9/5/2016, p. 104/177
102/159 unique introduced into systems and methods is not significantly different, different analytes can possibly be labeled with identical identifiers. The systems and methods disclosed in this document enable the detection of free cell polynucleotide sequences labeled with the same identifier. In some cases, a reference sequence may be included with the population of cell-free polynucleotide sequences to be analyzed. The reference sequence can be, for example, a nucleic acid with a known sequence and a known amount. If the unique identifiers are oligonucleotide barcodes and the analytes are nucleic acids, the labeled analytes can subsequently be sequenced and quantified. These methods can indicate whether one or more fragments and / or analytes have been assigned identical bar codes.
[226] A method disclosed in this document may comprise using reagents necessary for the assignment of bar codes to analytes. In the case of binding reactions, the reagents, including, but not limited to, enzyme ligase, buffer, adapter oligonucleotides, a plurality of unique identifier DNA barcodes and the like, can be loaded into the systems and methods. In the case of enrichment, reagents, including, without limitation, a plurality of PCR primers, oligonucleotides containing unique identification sequence or barcode sequence, DNA polymerase, DNTPs and buffer and the like, can be used in preparation for sequencing.
Petition 870160049132, of 9/5/2016, p. 105/177
103/159 [227] In general, the method and system of this disclosure may use the methods of U.S. Patent No. 7,537,897 in the use of molecular bar codes to count molecules or analytes.
[228] In a sample comprising fragmented genomic DNA, for example, cell-free DNA (cfDNA), from a plurality of genomes, there is some likelihood that more than one polynucleotide from different genomes will have the same start and stop positions (duplicates or cognates). The probable number of duplicates starting at any position is a function of the number of haploid genome equivalents in a sample and the fragment size distribution. For example, cfDNA has a fragment peak at about 160 nucleotides and most fragments at that peak are in the range of about 140 nucleotides to 180 nucleotides. In this way, the cfDNA of a genome of about 3 billion bases (for example, the human genome) can be comprised of almost 20 million (2x10 7 ) polynucleotide fragments. A sample of about 30 ng of DNA can contain about 10,000 human haploid genome equivalents. (Similarly, a sample of about 100 ng of DNA can contain about 30,000 human haploid genome equivalents.) A sample containing about 10,000 (10 4 ) haploid genome equivalents of such DNA can be about 200 billion (2x1o 11 ) of individual polynucleotide molecules. It has been empirically determined that in a sample of about 10,000 human DNA haploid genome equivalents, there are about 3 duplicate polynucleotides that start at any given position. Therefore, such collection may contain a diversity
Petition 870160049132, of 9/5/2016, p. 106/177
104/159
of fence in 6xlO 10 a 8xLO 10 (fence 60 billion to 80 billion, per example, fence in 70 billion (7xLO 10 ) ) in molecules in polynucleotide sequenced in mode
different .
[229] The probability of correctly identifying the molecules depends on the initial number of genome equivalents, the length distribution of sequenced molecules, the uniformity of sequence and the number of tags. When the label count is equal to one, that is, equivalent to not having exclusive labels or without labeling. The table below lists the probability of correctly identifying a molecule as unique assuming a typical free cell size distribution as above.
Label count % of tags uniquely identified correctly 100 human haploid genome equivalents1 96.9643 4 99.2290 9 99.6539 16 99,8064 25 99.8741 100 99.9685 3,000 human haploid genome equivalents1 91.7233 4 97.8178 9 99.0198 16 99.4424 25 99.6412
Petition 870160049132, of 9/5/2016, p. 107/177
105/159
100 99.9107
[230] In this case, by sequencing genomic DNA, it may not be possible to determine which sequence readings are derived from which parent molecules. This problem can be mitigated by labeling the parent molecules with a sufficient number of unique identifiers (for example, the tag count) so that there is a likelihood that two duplicate molecules, that is, molecules that have the same starting and stop, carry different unique identifiers so that the sequence readings are traceable to the particular parent molecules. One approach to this problem is to uniquely label all, or almost all, of the different parent molecules in the sample. However, depending on the number of haploid gene equivalents and the fragment size distribution in the sample, this may require billions of different unique identifiers.
[231] This method can be tedious and costly. This invention provides methods and compositions in which a population of polynucleotides in a fragmented genomic DNA sample is labeled with n different unique identifiers, where n is at least 2 and no more than 100,000 * z, where z is a measure of trend central (for example, mean, median, mode) of an expected number of duplicate molecules that have the same start and stop positions. In certain embodiments, n is at least any of 2 * z, 3 * z, 4 * z, 5 * z, 6 * z, 7 * z, 8 * z, 9 * z, 10 * z, ll * z , 12 * z, 13 * z, 14 * z, 15 * z, 16 * z, 17 * z, 18 * z, 19 * z or 20 * z (e.g., lower limit). In other modalities, n is not
Petition 870160049132, of 9/5/2016, p. 108/177
106/159 greater than 100,000 * z, 10,000 * z, 1,000 * z or 100 * z (e.g., upper limit). Therefore, n cannot be in the range between any combination of these lower and upper limits. In certain embodiments, n is between 5 * z and 15 * z, between 8 * z and 12 * z or about 10 * z. For example, a human haploid genome equivalent has about 3 picograms of DNA. A sample of about 1 microgram of DNA contains about 300,000 human haploid genome equivalents. The number n can be between 15 and 45, between 24 and 36 or about 30. Improvements in sequencing can be achieved as long as at least part of the duplicated or cognate polynucleotides carry unique identifiers, that is, they carry different tags. However, in certain embodiments, the number of tags used is selected so that there is at least a 95% chance that all duplicate molecules that start and any position carry unique identifiers. For example, a sample comprising about 10,000 human haploid cfDNA genome equivalents can be tagged with about 36 unique identifiers. Unique identifiers can comprise six unique DNA barcodes. Attached to both ends of a polynucleotide, 36 possible unique identifiers are produced. Samples so labeled can be those with a range of about 10 ng to any one of about 100 ng, about 1 pg, about 10 pg of fragmented polynucleotides, for example, genomic DNA, for example, cfDNA.
[232] Thus, this invention also provides labeled polynucleotide compositions. The
Petition 870160049132, of 9/5/2016, p. 109/177
107/159 polynucleotides can comprise fragmented DNA, for example, cfDNA. A set of polynucleotides in the composition that maps a base mappable position in a genome can be labeled non-exclusively, that is, the number of different identifiers can be at least 2 and less than the number of polynucleotides that map the position mappable base. A composition of between about 10 ng to about 10 pg (for example, any of about 10 ng to 1 pg, about 10 ng to 100 ng, about 100 ng to 10 pg, about 100 ng to 1 pg, about 1 pg to 10 pg) can carry between any of 2, 5, 10, 50 or 100 to any of 100, 1,000, 10,000 or 100,000 different identifiers. For example, between 5 and 100 different identifiers can be used to tag polynucleotides in such a composition.
III. Nucleic Acid Sequencing Platforms [233] After extraction and isolation of cell-free polynucleotides from body fluids, cell-free sequences can be sequenced. Typically, a sequencing method is classic Sanger sequencing. Sequencing methods may include, but are not limited to: high throughput sequencing, pyro sequencing, synthesis sequencing, single molecule sequencing, nanoporous sequencing, semiconductor sequencing, ligation sequencing, hybridization sequencing, RNA-Seq (Illumina) , Digital Gene Expression (Helicos), Next Generation Sequencing, Single Synthesis Molecule Sequencing (SMSS) (Helicos), Massively Parallel Sequencing, Single Clonal Molecule Arrangement
Petition 870160049132, of 9/5/2016, p. 110/177
108/159 (Solexa), sequencing by firing, sequencing by Maxim-Gilbert, primer walking (primer walking), sequencing using PacBio, SOLiD, Ion Torrent or Nanopore platforms and any other sequencing methods known in the art.
[234] In some cases, various types of sequencing reactions, as described above, may comprise a variety of sample processing units. Sample processing units may, however, include, without limitation, multiple routes, multiple channels, multiple wells or other means for processing multiple sets of samples substantially simultaneously. In addition, the sample processing unit can include multiple chambers to enable multiple cycles to be processed simultaneously.
[235] In some examples, simultaneous sequencing reactions can be performed using multiplex sequencing. In some cases, cell-free polynucleotides can be sequenced with at least 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 sequencing reactions. In other cases, cell-free polynucleotides can be sequenced with less than 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 sequencing reactions. Sequencing reactions can be carried out sequentially or simultaneously. Subsequent data analysis can be performed on all or part of the sequencing reactions. In some cases, data analysis can be performed on at least 1,000, 2,000, 3,000, 4,000,
Petition 870160049132, of 9/5/2016, p. 111/177
109/159
5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 sequencing reactions. In other cases, data analysis can be performed in less than 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 sequencing reactions.
[236] In other examples, the number of sequence reactions can provide coverage for different amounts of the genome. In some cases, genome sequence coverage can be at least 5%, 10%, 15%, 20%,
25%, 30%, 40%, 50%, 60%, 70%, 80%, 90% , 95%, 99%, 99.9% or 100% . In other cas the, coverage of sequence of the genome can to be less than 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99, 9% or 100%. [237] In some examples, sequencing Can be
performed on cell-free polynucleotides that comprise a variety of different types of nucleic acids. Nucleic acids can be polynucleotides or oligonucleotides. The nucleic acids included, however, without limitation, DNA or RNA, single-stranded or double-stranded or an RNA / cDNA pair.
IV. Polynucleotide Analysis Strategy [238] Figure 8 is a diagram, 800, which shows a strategy for analyzing polynucleotides in a sample of early genetic material. In step 802, a sample containing initial genetic material is provided. The sample may include target nucleic acid in low abundance. For example, the nucleic acid of a normal or wild-type genome (for example, a germline genome) may predominate in a sample that also includes no more than 20%, no more than 10%, no more than 5%, no more than
Petition 870160049132, of 9/5/2016, p. 112/177
110/159
1%, no more than 0.5% or no more than 0.1% nucleic acid from at least one other genome containing genetic variation, for example, a cancer genome or a fetal genome, or a genome of another species. The sample may include, for example, cell-free nucleic acid or cells comprising nucleic acid. The initial genetic material may constitute no more than 100 ng of nucleic acid. This can contribute to appropriate oversampling of the original polynucleotides by sequencing or genetic analysis process. Alternatively, the sample can be limited or artificially restricted to reduce the amount of nucleic acid to no more than 100 ng or selectively enriched to analyze only the sequences of interest. The sample can be modified to selectively produce sequence readings of molecules that map each one of one or more locations selected in a reference sequence. A 100 ng nucleic acid sample can contain about 30,000 human haploid genome equivalents, that is, molecules that together provide 30,000 times the coverage of a human genome.
[239] In step 804, the initial genetic material is converted into a set of labeled parent polynucleotides. Labeling may include attaching sequenced tags to molecules in the initial genetic material. Sequenced tags can be selected so that all unique polynucleotides that map to the same location in a reference sequence have a unique identification tag. The conversion can be carried out at high efficiency, for example, at least 50%.
Petition 870160049132, of 9/5/2016, p. 113/177
111/159 [240] In step 806, the set of labeled parent polynucleotides is amplified to produce a corresponding set of amplified progeny polynucleotides. The amplification can be, for example, 1,000 times.
[241] In step 808, the set of amplified progeny polynucleotides is sampled for sequencing. The sample rate is chosen so that the sequence readings produced both (1) cover a target number of molecules in the set of labeled parent polynucleotides and (2) cover unique molecules in the set of labeled parent polynucleotides in a target coverage multiplication ( for example, 5 to 10 fold coverage of parent polynucleotides).
[242] In step 810, the set of sequence readings is collected to produce a set of consensus sequences corresponding to unique labeled parent polynucleotides. Sequence readings can be qualified for inclusion in the analysis. For example, sequence readings that fail to achieve a score
of control in quality can be removed of grouping. At readings of sequence can to be classified in families that represent readings in
progeny molecules derived from a particular unique parent molecule. For example, a family of amplified progeny polynucleotides may constitute those amplified molecules derived from a single parent polynucleotide. By comparing progeny sequences in a family, a consensus sequence from the original parent polynucleotide can be deduced. That
Petition 870160049132, of 9/5/2016, p. 114/177
112/159 produces a set of consensus sequences that represent unique parent polynucleotides in the labeled cluster.
[243] In step 812, the consensus sequence set is analyzed using any of the analytical methods described in this document. For example, consensus sequences that map to a particular reference sequence site can be analyzed to detect instances of genetic variation. Consensus strings that map to particular reference strings can be measured and normalized against control samples. The measurements of molecules that map reference sequences can be compared using a genome to identify areas in the genome where the copy number varies or heterozygosity is lost.
[244] Figure 9 is a diagram showing a more generic method for extracting information from a signal represented by a collection of sequence readings. In this method, after the sequencing of amplified progeny polynucleotides, the sequence readings are grouped into families of molecules amplified from a unique identity molecule (910). This grouping can be a jumping point for methods to interpret the information in the sequence to determine the contents of the parent polynucleotides tagged with higher fidelity, for example, less noise and / or distortion.
[245] Analysis of the collection of sequence readings allows inference to be made about the parent polynucleotide population from which the readings of
Petition 870160049132, of 9/5/2016, p. 115/177
113/159 strings were generated. Such inferences may be useful due to the fact that sequencing typically involves reading only a partial set of the total global amplified polynucleotides. Therefore, one cannot be sure that each parent polynudeotide will be represented by at least one sequence reading in the collection of sequence readings.
[246] Such an inference is the number of unique parent polynucleotides in the original cluster. Such an inference can be made based on the number of unique families in which the sequence readings can be grouped and the number of sequence readings in each family. In this case, a family refers to a collection of readings of a sequence traceable to an original parent polynudeotide. The inference can be made using well-known statistical methods. For example, if the cluster produces many families, each represented by one or a few progenies, then it can be inferred that the original population included more exclusive parent polynucleotides that were not sequenced. On the other hand, if the cluster produces only a few families, each family represented by many progenies, then it can be inferred that the majority of the exclusive polynucleotides in the progenitor population is represented by at least one sequence reading group in that family.
[247] Another such inference is the frequency of a base or sequence of bases at a particular locus in an original cluster of polynucleotides. Such inference can be made based on the number of exclusive families in
Petition 870160049132, of 9/5/2016, p. 116/177
114/159 that sequence readings can be grouped and the number of sequence readings in each family. By analyzing the base calls at a locus in a family of sequence readings, a confidence score is assigned to each particular base call or sequence. Then, taking into account the confidence score for each base call in a plurality of families, the frequency of each base or sequence in the locus is determined.
V. Detection of Variation in Number of Copies
A. Detection of Variation in the Number of Copies Using a Single Sample [248] Figure 1 is a diagram, 100, showing a strategy for detecting variation in the number of copies in a single individual. As shown in this document, methods for detecting variation in the number of copies can be implemented as follows. After extraction and isolation of cell-free polynucleotides in step 102, a single unique sample can be sequenced by a nucleic acid sequencing platform known in the art in step 104. This step generates a plurality of genomic fragment sequence readings. In some cases, these string readings may contain barcode information. In other examples, bar codes are not used. After sequencing, quality scores are assigned to readings. A quality score can be a representation of readings that indicates whether those readings can be useful in subsequent analysis based on a threshold. In some cases, some readings are not of sufficient quality or length to perform the step
Petition 870160049132, of 9/5/2016, p. 117/177
115/159 subsequent mapping. Sequence readings with a quality score of at least 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data. In other cases, sequencing readings assigned with a quality score of less than 90%, 95%, 99%, 99, 9%, 99.99% or 99.999% can be filtered from the data set. In step 106, the genomic fragment readings that reach a specified quality score limit are mapped by a reference genome, or a model sequence that is known to contain copy number variations. After the mapping alignment, a mapping score is assigned to the sequence readings. A mapping score can be a representation or readings mapped back to the reference sequence that indicates whether or not each position is uniquely mappable. In some cases, the readings may be strings unrelated to the analysis of variation in the number of copies. For example, some sequence readings may originate from contaminating polynucleotides. Sequencing readings with a mapping score of at least 90%, 95%, 99%, 99, 9%, 99, 99% or 99, 999% can be filtered from the data set. In other cases, sequencing readings assigned with a mapping score less than 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data set.
[249] After filtering and mapping data, the plurality of sequence readings generates a chromosomal region of coverage. In step 108, these chromosomal regions can be divided into gaps or gaps. A gap or gap can be at least 5 kb, 10,
Petition 870160049132, of 9/5/2016, p. 118/177
116/159 kb, 25 kb, 30 kb, 35 kb, 40 kb, 50 kb, 60 kb, 75 kb, 100 kb, 150 kb, 200 kb, 500 kb, or 1,000 kb. A gap or gap can also have bases up to 5 kb, 10, kb, 25 kb, 30 kb, 35 kb, 40 kb, 50 kb, 60 kb, 75 kb, 100 kb, 150 kb, 200 kb, 500 kb, or 1,000 kb. A gap or gap can also be about 5 kb, 10, kb, 25 kb, 30 kb, 35 kb, 40 kb, 50 kb, 60 kb, 75 kb, 100 kb, 150 kb, 200 kb, 500 kb, or 1,000 kb.
[250] For normalization of coverage in step 110, each interval or gap is selected to contain about the same number of mappable bases. In some cases, each gap or gap in a chromosomal region may contain the exact number of mappable bases. In other cases, each gap or gap may contain a different number of mappable bases. In addition, each gap or gap may not be overlapped with an adjacent gap or gap. In other cases, a gap or gap may overlap another adjacent gap or gap. In some cases, a gap or gap may overlap by at least 1 bp, 2, bp, 3 bp, 4 bp, 5 bp, 10 bp, 20 bp,
25 bp, 50 bp, 100 bp, 200 bp, 250 bp, 500 bp or 1,000 bp. In others case s, an interval or a gap may overlap if per up to 1 bp, 2 bp, 3 bp, 4 bp, 5 bp, 10 bp, 20 bp, 25 bp, 50 bp , 100 bp, 200 bp, 25 0 bp, 50 0 bp or 1,000 bp. In others cases, a break or an gap can overlap up per fence of 1 bp, 2 bp, 3 bp, 4 bp , 5 bp, 10 bp, 20 bp, 25 bp, 50 bp, 100 bp, 200 bp, 250 bp, 500 bp or 1,000 bp. [251] In some cases, each one of the regions in interval can be scaled to so as to contain about of same number mappable bases exclusively. THE
Petition 870160049132, of 9/5/2016, p. 119/177
117/159 mapability of each base that comprises an interval region is determined and used to generate a mapability file that contains a representation of readings from the references that are mapped back to the reference for each file. The mappability file contains a line for each position, indicating whether or not each position is uniquely mappable.
[252] In addition, predefined intervals, known throughout the genome to be difficult to sequence, or to contain a substantially high GC propensity, can be filtered out of the data set. For example, regions known to fail close to the chromosome centromere (ie, centromeric DNA) are known to contain highly repetitive sequences that can produce false positive results. These regions can be filtered. Other regions of the genome, such as regions that contain a singularly high concentration of other highly repetitive sequences, such as microsatellite DNA, can be filtered out of the data set.
[253] The number of analyzed intervals can also
vary. In i some cases, at least 10, 20, 30, 40, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000 , 20 .000, 50,000 or 100,000 breaks are analyzed. In other cases, the number of breaks analyzed is up to 10, 20, 30, 40, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000 , 20 .000, 50,000 or 100,000 breaks are analyzed.
[254] For an exemplificative genome derived from cell-free polynucleotide sequences, the next step comprises determining the reading coverage for each gap region. This can be accomplished by using
Petition 870160049132, of 9/5/2016, p. 120/177
118/159 or barcode or barcode readings. In cases without bar codes, the previous mapping step will provide coverage for different base positions. Sequence readings that have sufficient quality and mapping scores and are within chromosome ranges that are not filtered, can be counted. The number of coverage readings can be assigned a score for each mappable position. In cases involving barcodes, all sequences with the same barcode, physical properties or combination of the two can be collected in one reading, since they are all derived from the sample parent molecule. This step reduces propensities that may have been introduced during any of the previous steps, such as steps that involve amplification. For example, if one molecule is amplified 10 times, but another is amplified 1,000 times, each molecule is only represented once after collection, thus negating the uneven amplification effect. Only readings with unique bar codes can be counted for each mappable position and influence the assigned score.
[255] Consensus sequences can be generated from families of sequence readings by any method known in the art. Such methods include, for example, linear or non-linear methods to build consensus sequences (such as voting, averaging, statistics, maximum a posteriori or maximum probability detection, dynamic, Bayesian, hidden Markov programming or vector machine methods. support, etc.) derived from digital communication theory, information theory or
Petition 870160049132, of 9/5/2016, p. 121/177
119/159 bioinformatics.
[256] After the sequence reading coverage has been determined, a stochastic modeling algorithm is applied to convert the normalized nucleic acid sequence reading coverage for each range region to the discrete copy number states. In some cases, this algorithm may comprise one or more of the following: Hidden Markov model, dynamic programming, support vector machine, Bayesian network, Trellis decoding, Viterbi decoding, expectation maximization, Kalman filtering methodologies and neural networks.
[257] In step 112, the discrete copy number states of each range region can be used to identify variation in the number of copies in the chromosomal regions. In some cases, all adjacent range regions with the same number of copies can be merged into one segment to report the presence or absence of a varying number of copies state. In some cases, multiple ranges can be filtered before being merged with other segments.
[258] In step 114, the variation in the number of copies can be reported as a graph, indicating various positions in the genome and a corresponding increase or decrease or maintenance of variation in the number of copies in each respective position. In addition, the copy number variation can be used to report a percentage score that indicates how much disease material (or nucleic acids that have a copy number variation) exists in the cell-free polynucleotide sample.
Petition 870160049132, of 9/5/2016, p. 122/177
120/159 [259] A method for determining copy number variation is shown in Figure 10. In this method, after grouping sequence readings into families generated from a single parent polynucleotide (1010), families are quantified , for example, by determining the number of families that map to each of a plurality of different reference sequence locations. CNVs can be determined directly by comparing a quantitative measure of families in each of a plurality of different loci (1016b). Alternatively, one can infer a quantitative measure of families in the population of labeled parent polynucleotides using both a quantitative measure of families and a quantitative measure of family members in each family, for example, as discussed above. Then, CNV can be determined by comparing the inferred measure of quantity in the plurality of loci. In others, a hybrid approach can be taken, according to which a similar inference of original quantity can be made after normalization for representational propensity during the sequencing process, such as GC bias propensity, etc.
[260] B. Detection of Variation in Number of Copies Using Paired Sample [2 61] Detection of variation in number of copies per paired sample shares many of the steps and parameters as the single sample approach described in this document. However, as depicted in 200 of Figure 2 of variation in the number of copies, detection using paired samples requires comparison of coverage of
Petition 870160049132, of 9/5/2016, p. 123/177
121/159 sequence for a control sample instead of comparing it to the predicted genome mapping. This approach can assist in normalization across the interval.
[2 62] Figure 2 is a diagram 200 showing a strategy for detecting variation in the number of copies in a matched individual. As shown in this document, methods for detecting variation in the number of copies can be implemented as follows. In step 204, a single unique sample can be sequenced by a nucleic acid sequencing platform known in the art after extracting and isolating the sample in step 202. This step generates a plurality of genome fragment sequence readings. Additionally, a sample or a control sample is taken from another individual. In some cases, the controlling individual may be an individual not known to have a disease, while the other individual may have or be at risk for a particular disease. In some cases, these string readings may contain barcode information. In other examples, bar codes are not used. After sequencing, a quality score is assigned to the readings. In some cases, some readings are not of sufficient quality or length to carry out the subsequent mapping step. Sequencing readings with a quality score of at least 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data set. In other cases, sequencing readings assigned with a quality score of less than 90%, 95%, 99%, 99.9%, 99, 99% or 99, 999% can be filtered from the data set. In step 206, the fragment readings
Petition 870160049132, of 9/5/2016, p. 124/177
122/159 genomics that reach a specified quality score limit are mapped by a reference genome, or a model sequence that is known to contain copy number variations. After the mapping alignment, a mapping score is assigned to the sequence readings. In some cases, the readings may be strings unrelated to the analysis of variation in the number of copies. For example, some sequence readings may originate from contaminating polynucleotides. Sequencing readings with a mapping score of at least 90%, 95%, 99%, 99, 9%, 99, 99% or 99, 999% can be filtered from the data set. In other cases, sequencing readings assigned with a mapping score less than 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data set.
[263] After filtering and mapping data, the plurality of sequence readings generates a chromosomal region of coverage for each of the test and control individuals. In step 208, these chromosomal regions can be divided into gaps or gaps. A gap or gap can be at least 5 kb, 10, kb, 25 kb, 30 kb, 35, kb, 40 kb, 50 kb, 60 kb, 75 kb, 100 kb, 150 kb, 200 kb, 500 kb, or 1,000 kb. A gap or gap can also be about 5 kb, 10, kb, 25 kb, 30 kb, 35 kb, 40 kb, 50 kb, 60 kb, 75 kb, 100 kb, 150 kb, 200 kb, 500 kb, or 1,000 kb.
[264] For normalization of coverage in step 210, each interval or gap is selected to contain about the same number of mappable bases for each of the test and control individuals. In some cases, each interval or
Petition 870160049132, of 9/5/2016, p. 125/177
123/159 gap in a chromosomal region may contain the exact number of mappable bases. In other cases, each gap or gap may contain a different number of mappable bases. In addition, each gap or gap may not be overlapped with an adjacent gap or gap. In other cases, a gap or gap may overlap another adjacent gap or gap. In other cases, a gap or gap may overlap by about 1 bp, 2 bp, 3 bp, 4 bp, 5 bp, 10 bp, 20 bp, 25 bp, 50 bp, 100 bp, 200 bp, 250 bp , 500 bp or 1,000 bp. In other cases, a gap or gap may overlap by up to 1 bp, 2 bp, 3 bp, 4 bp, 5 bp, 10 bp, 20 bp, 25 bp, 50 bp, 100 bp, 200 bp, 250 bp, 500 bp or 1,000 bp.
[265] In some cases, each of the range regions is sized to contain about the same number of mappable bases uniquely for each of the test and control individuals. The mappability of each base that comprises an interval region is determined and used to generate a mappability file that contains a representation of readings from the references that are mapped back to the reference for each file. The mappability file contains a line for each position, indicating whether or not each position is exclusively mappable.
[266] In addition, predefined intervals, known throughout the genome to be difficult to sequence, or to contain a substantially high GC propensity, can be filtered out of the data set. For example, regions known to fail near the chromosome centromere (ie, centromeric DNA) are
Petition 870160049132, of 9/5/2016, p. 126/177
124/159 known to contain highly repetitive sequences that can produce false positive results. These regions can be filtered. Other regions of the genome, such as regions that contain a singularly high concentration of other highly repetitive sequences, such as microsatellite DNA, can be filtered out of the data set.
[267] The number of windows analyzed can also vary. In some cases, at least 10, 20, 30, 40, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000 or 100,000 intervals are analyzed. In other cases, less than 10, 20, 30, 40, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000 or 100,000 intervals are analyzed.
[268] For an exemplary genome derived from cell-free polynucleotide sequences, the next step comprises determining the reading coverage for each range region for each of the test and control individuals. This can be accomplished with the use of or readings with bar codes or without bar codes. In cases without bar codes, the previous mapping step will provide coverage for different base positions. Sequence readings that have sufficient mapping scores and qualities and are within the chromosomal ranges that are not filtered, can be counted. The number of coverage readings can be assigned a score for each mappable position. In cases involving bar codes, all sequences with the same bar code can be collected in one reading, since they are all derived from the sample parent molecule. This step reduces propensities that may have been introduced
Petition 870160049132, of 9/5/2016, p. 127/177
125/159 during any of the previous steps, such as the steps that involve amplification. Only readings with unique bar codes can be counted for each mappable position and influence the assigned score. For this reason, it is important that the barcode connection step is performed in an optimized manner to produce the lowest amount of propensity.
[269] In determining the nucleic acid reading coverage for each interval, the coverage of each interval can be normalized by the average coverage of each sample. Using such an approach, it may be desirable to sequence both the test and control subject under similar conditions. The reading coverage for each interval can then be expressed as a ratio across similar intervals.
[270] The nucleic acid reading coverage ratios for each test subject interval can be determined by dividing the reading coverage of each test sample interval region with the reading coverage of a corresponding sample interval region of control.
[271] After the sequence reading coverage ratios have been determined, a stochastic modeling algorithm is applied to the normalized ratios for each interval region in discrete copy number states. In some cases, this algorithm may comprise a Hidden Markov Model. In other cases, the stochastic model may comprise dynamic programming, support vector machine, Bayesian modeling, probabilistic modeling, Trellis decoding, decoding of
Petition 870160049132, of 9/5/2016, p. 128/177
126/159
Viterbi, maximization of expectation, Kalman filtering methodologies or neural networks.
[272] In step 212, the discrete copy number states for each range region can be used to identify variation in the number of copies in the chromosomal regions. In some cases, all adjacent range regions with the same number of copies can be merged into one segment to report the presence or absence of a varying number of copies state. In some cases, multiple ranges can be filtered before being merged with other segments.
[273] In step 214, the variation in the number of copies can be reported as a graph, indicating various positions in the genome and a corresponding increase or decrease or maintenance of variation in the number of copies in each respective position. In addition, the copy number variation can be used to report a percentage score that indicates how much disease material is in the cell-free polynucleotide sample.
SAW. Rare Mutation Detection [274] Rare mutation detection shares attributes similar to both copy number variation approaches. However, as depicted in Figure 3, 300, rare mutation detection uses comparison of sequence coverage to a control sample or reference sequence instead of comparing it to the relative mapping of the genome. This approach can assist in normalization across the interval.
[275] In general, detection of rare mutations can be performed in selectively enriched regions of the genome
Petition 870160049132, of 9/5/2016, p. 129/177
127/159 or purified and isolated transcriptome in step 302. As described in this document, specific regions, which may, however, include, without limitation, genes, oncogenes, tumor suppressor genes, promoters, regulatory sequence elements, non-coding regions , miRNAs, snRNAs and the like, can be selectively amplified from a total population of cell-free polynucleotides. This can be done as described in this document. In one example, multiplex sequencing can be used, with or without barcode identifications for individual polynudeotide sequences. In other examples, sequencing can be performed using any nucleic acid sequencing platforms known in the art. This step generates a plurality of genome fragment sequence readings as in step 304. Additionally, a reference sequence is obtained from a control sample, taken from another individual. In some cases, the control individual may be an individual known to have no genetic abnormalities or disease. In some cases, these sequence readings may contain barcode information. In other examples, bar codes are not used. After sequencing, a quality score is assigned to the readings. A quality score can be a representation of readings that indicates whether those readings can be useful in subsequent analysis based on a threshold. In some cases, some readings are not of sufficient quality or length to carry out the subsequent mapping step. Sequencing readings
Petition 870160049132, of 9/5/2016, p. 130/177
128/159 with a quality score of at least 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data set. In other cases, sequencing readings assigned with a quality score of at least 90%, 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered from the data set. In step 306, genomic fragment readings that reach a specified quality score limit are mapped by a reference genome, or a reference sequence that is known to contain rare mutations. After the mapping alignment, a mapping score is assigned to the sequence readings. A mapping score can be a representation or readings mapped back to the reference sequence that indicates whether or not each position is uniquely mappable. In some cases, the readings may be strings unrelated to the rare mutation analysis. For example, some sequence readings may originate from contaminating polynucleotides. The readings
sequencing with a score of hair mapping minus 90% , 95%, 99%, 99.9%, 99.99% or 99.999% can be filtered of the set of Dice. In others cases, the readings sequencing assigned common mapping punctuated in less than 90%, 95%, 99%, 99.9%, 99.99% or
99.999% can be filtered from the data set.
[276] For each mappable base, bases that do not reach the minimum limit for mapability, or bases of low quality, can be replaced by the corresponding bases as found in the reference sequence.
[277] After filtering and mapping data, the
Petition 870160049132, of 9/5/2016, p. 131/177
129/159 variant bases found between the sequence readings obtained from the individual and the reference sequence are analyzed.
[278] For an exemplificative genome derived from cell-free polynucleotide sequences, the next step comprises determining the reading coverage for each base mappable position. This can be accomplished with the use of or readings with bar codes or without bar codes. In cases without bar codes, the previous mapping step will provide coverage for different base positions. Sequence readings that have sufficient mapping scores and qualities can be counted. The number of coverage readings can be assigned a score for each mappable position. In cases involving bar codes, all sequences with the same bar code can be collected in a consensus reading, since they are all derived from the sample parent molecule. The sequence for each base is aligned with the most dominant nucleotide reading for that specific location. In addition, the number of unique molecules can be counted at each position to derive simultaneous quantification at each position. This step reduces propensities that may have been introduced during any of the previous steps, such as steps that involve amplification. Only readings with unique bar codes can be counted for each mappable position and influence the assigned score.
[279] Once the reading coverage can be confirmed and variant bases relative to the control sequence at each reading are identified, the frequency
Petition 870160049132, of 9/5/2016, p. 132/177
130/159 of variant bases can be calculated as the number of readings containing the derivative variant by the total number of readings. This can be expressed as a reason for each mappable position in the genome.
[280] For each base position, the frequencies of all four nucleotides, cytosine, guanine, thymine, adenine, are analyzed against the reference sequence. A stochastic or statistical modeling algorithm is applied to convert the normalized ratios for each mappable position to reflect the frequency states for each base variant. In some cases, this algorithm may comprise one or more of the following: Hidden Markov model, dynamic programming, support vector machine, probabilistic or Bayesian modeling, Trellis decoding, decoding of
Viterbi, maximization of expectancy, methodologies in filtration in Kalman and networks neural. [281] At step 312, the states mutation rare discreet in each base position can be used for
identify a base variant with a high frequency of variance compared to the baseline of the reference sequence. In some cases, the baseline may represent a frequency of at least 0.0001%, 0.001%, 0.01%, 0.1%, 1.0%, 2.0%, 3.0%, 4, 0%, 5.0%, 10% or 25%. In other cases, the baseline may represent a frequency of at least 0.0001%, 0.001%, 0.01%, 0.1%,
1.0%, 2.0%, 3.0%, 4.0%, 5.0%, 10% or 25%. In some cases, all adjacent base positions with the base mutation or variant can be merged into one segment to report the presence or absence of a rare mutation. In
Petition 870160049132, of 9/5/2016, p. 133/177
131/159 In some cases, multiple positions can be filtered before being merged with other segments.
[282] After calculating the frequencies of variance for each base position, the variant with the largest deviation to a specific position in the sequence derived from the individual compared to the reference sequence is identified as a rare mutation. In some cases, a rare mutation can be a cancer mutation. In other cases, a rare mutation may be correlated with a disease state.
[283] A rare mutation or variant may comprise a genetic aberration that includes, but is not limited to, a single base substitution, or small indels, transversions, translocations, inversions, deletions, truncations or gene truncations. In some cases, a rare mutation can be a maximum of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 nucleotides in length. In other cases, a rare mutation can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 nucleotides in length.
[284] In step 314, the presence or absence of a mutation can be reflected in graphical form, indicating several positions in the genome and a corresponding increase or decrease or maintenance of a mutation frequency at each respective position. In addition, rare mutations can be used to report a percentage score that indicates how much disease material exists in the cell-free polynucleotide sample. A confidence score can track each mutation detected, given known statistics of typical variances at positions reported in non-disease reference sequences. Mutations can also be classified in order of
Petition 870160049132, of 9/5/2016, p. 134/177
132/159 abundance in the individual or classified by clinically actionable importance.
[285] Figure 11 shows a method for inferring the frequency of a base or sequence of bases at a particular locus in population polynucleotides. Sequence readings are grouped into families generated from an original labeled polynucleotide (1110). For each family, one or more bases in the locus is assigned a confidence score for each. The confidence score can be assigned by any of several known statistical methods and can be based, at least in part, on the frequency at which a base appears among the sequence readings that belong to the family (1112). For example, the confidence score may be the frequency at which the base appears between sequence readings. As another example, for each family, a hidden Markov model can be constructed, so that a maximum probability or a maximum a posteriori decision can be made based on the frequency of occurrence of a particular base in a single family. As part of this model, the resulting error probability and confidence score for a particular decision can be issued as well. A baseline frequency in the original population can then be attributed to household confidence scores (1114).
VII. applications
A. Early Cancer Detection [286] Numerous cancers can be detected using the methods and systems described in this document. Cancer cells, like most cells, can
Petition 870160049132, of 9/5/2016, p. 135/177
133/159 be characterized by a renewal rate, in which old cells die and are replaced by newer cells. In general, dead cells, in contact with the vasculature in a given individual, can release DNA or DNA fragments into the bloodstream. This is also known for cancer cells during various stages of the disease. Cancer cells can also be characterized, depending on the stage of the disease, by various genetic abnormalities, such as variation in the number of copies as well as rare mutations. This phenomenon can be used to detect the presence or absence of individuals with cancer using the methods and systems described in this document.
[287] For example, the blood of individuals at risk for cancer can be extracted and prepared as described in this document to generate a population of cell-free polynucleotides. In one example, this could be cell-free DNA. Disclosure systems and methods can be employed to detect rare mutations or copy number variations that may exist in certain cancers present. The method can help to detect the presence of cancer cells in the body, despite the absence of symptoms or other marks of the disease.
[288] The types and number of cancers that can be detected may include, but are not limited to, blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, skin cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers,
Petition 870160049132, of 9/5/2016, p. 136/177
134/159 kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogeneous tumors and the like.
[289] In the early detection of cancers, any of the systems or methods described in this document, including detection of rare mutation or detection of copy number variation, can be used to detect cancers. These systems and methods can be used to detect any number of genetic abnormalities that can cause or result from cancers. These may, however, include, but are not limited to, mutations, rare mutations, indels, copy number variations, transversions, translocations, inversions, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure changes, gene fusions, fusions chromosome, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in chemical modifications of nucleic acid, abnormal changes in epigenetic patterns, abnormal changes in infection by nucleic acid methylation and cancer.
[290] Additionally, the methods and systems described in this document can also be used to help characterize certain cancers. The genetic data produced from the systems and methods of this disclosure can allow practitioners to help better characterize a specific form of cancer. Usually, cancers are heterogeneous in both composition and staging. Genetic profile data can allow the characterization of subtypes
Petition 870160049132, of 9/5/2016, p. 137/177
135/159 specific cancers that may be important in the diagnosis or treatment of that specific subtype. This information can also provide clues to an individual or professional regarding the prognosis of a specific type of cancer.
B. Monitoring and Prognosis of Cancer [291] The systems and methods provided in this document can be used to monitor already known cancers, or other diseases in a particular individual. This can allow either an individual or a professional to adapt treatment options according to the progress of the disease. In this example, the systems and methods described in this document can be used to construct genetic profiles of a particular individual in the course of the disease. In some cases, cancers can progress, becoming more aggressive and genetically unstable. In other instances, cancers may remain benign, inactive, dormant or in remission. The systems and methods of this disclosure can be useful in determining disease progression, remission or recurrence.
[292] In addition, the systems and methods described in this document can be useful in determining the effectiveness of a particular treatment option. In one example, successful treatment options can actually increase the amount of copy number variation or rare mutations detected in the individual's blood if treatment is successful, as more cancers can die and disperse DNA. In other examples, this may not be the case. In other examples, perhaps certain treatment options can be correlated with genetic profiles
Petition 870160049132, of 9/5/2016, p. 138/177
136/159 of cancers over time. This correlation can be useful in selecting a therapy. In addition, if a cancer is seen to be in remission after treatment, the systems and methods described in this document may be useful in monitoring residual disease or disease recurrence.
[293] For example, mutations that occur within a frequency range that starts at the threshold level can be determined from DNA in a sample from an individual, for example, a patient. The mutations can be, for example, cancer-related mutations. The frequency can be in the range, for example, at least 0.1%, at least 1%, or at least 5% to 100%. The sample can be, for example, cell-free DNA or a tumor sample. A course of treatment can be prescribed based on any or all of the mutations that occur within the frequency range, including, for example, their frequencies. A sample can be taken from the individual at any subsequent time. Mutations that occur within the original frequency range or a different frequency range can be determined. The course of treatment can be adjusted based on subsequent measurements.
C. Early Detection and Monitoring of Other Diseases or Disease States [294] The methods and systems described in this document may not be limited to the detection of rare mutations and copy number variations associated only with cancers. Various other diseases and infections can result in other types of conditions that may be suitable for early detection and monitoring. For example, in certain
Petition 870160049132, of 9/5/2016, p. 139/177
137/159 cases, genetic disorders or infectious diseases can cause a certain genetic mosaicism within an individual. This genetic mosaicism can cause variation in the number of copies and rare mutations that could be observed. In another example, disclosure systems and methods can also be used to monitor the genomes of immune cells within the body. Immune cells, such as B cells, can undergo rapid clonal expansion due to the presence of certain diseases. Clonal expansions can be monitored using the detection of copy number variation and certain immunological states can be monitored. In this example, analysis of copy number variation can be performed over time to produce a profile of how a particular disease may be progressing.
[295] Furthermore, the systems and methods of this disclosure can also be used to monitor systemic infections on their own, as they can be caused by a pathogen, such as a battery or a virus. Detection of copy number variation or even a rare mutation can be used to determine how a population of pathogens is changing during the course of infection. This can be particularly important during chronic infections, such as HIV / AIDS or hepatitis infections,
viruses can to change the cycle state of life and / or suffer mutation in shapes most virulent during the course of infection. [2 96] Already other example where systems and the methods
of this revelation can be used is the monitoring of transplanted individuals. In general, the transplanted tissue
Petition 870160049132, of 9/5/2016, p. 140/177
138/159 suffers a degree of rejection by the body through transplantation. The methods of this disclosure can be used to determine or profile the host body's rejection activities as the immune cells try to destroy the transplanted tissue. This can be useful in monitoring the condition of the transplanted tissue as well as changing the course of treatment or preventing rejection.
[297] Furthermore, methods of disclosure can be used to characterize the heterogeneity of an abnormal condition in an individual, in which the method is to generate a genetic profile of extracellular polynucleotides in the individual, in which the genetic profile comprises a plurality of data resulting from analyzes of rare mutation and variation in the number of copies. In some cases, including, but not limited to, cancer, a disease can be heterogeneous. The diseased cells may not be identical. In the cancer example, some tumors are known to comprise different types of tumor cells, some cells in different stages of cancer. In other examples, heterogeneity can comprise multiple foci of disease. Again, in the cancer example, there may be multiple foci of tumor, perhaps where one or more foci are the result of metastasis that has spread from a primary site.
[298] The methods of this disclosure can be used to generate or define the profile, footprint or data set that is a sum of genetic information derived from different cells in a heterogeneous disease. This data set can comprise analyzes of variation in the number of copies and
Petition 870160049132, of 9/5/2016, p. 141/177
139/159 rare mutation alone or in combination.
D. Early Detection and Monitoring of Other Diseases or Disease States of Fetal Origin [299] Additionally, disclosure systems and methods can be used to diagnose, predict, monitor or observe cancers or other diseases of fetal origin. That is, these methodologies can be used in a pregnant individual to diagnose, predict, monitor or observe cancers or other diseases in an unborn individual whose DNA and other polynucleotides can co-circulate with maternal molecules.
VIII. Terminology [300] The terminology used in this document is for the purpose of describing the particular modalities only and is not intended to limit the systems and methods of this disclosure. As used in this document, the singular forms one, one, o and a are intended to include plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms it includes, includes, has, has, with or variants of them are used or in the detailed description and / or in the claims, such terms are intended to be inclusive in a manner similar to the term you understand.
[301] Several aspects of the systems and methods of this disclosure are described above with reference to the example applications for illustration. It must be understood that numerous specific details, relationships and methods are represented to provide a complete understanding of the systems and methods. One versed in the relevant technique, in
Petition 870160049132, of 9/5/2016, p. 142/177
However, you will easily see that systems and methods can be practiced without one or more of the specific details or with other methods. This disclosure is not limited by the illustrated order of acts or events, as some acts may occur and orders differ and / or concurrently with other acts or events. In addition, not all illustrated acts or events are necessary to implement a methodology in accordance with this disclosure.
[302] The ranges can be expressed in this document as about one particular value and / or about another particular value. When such a range is expressed, another modality includes the particular value and / or even another particular value. Similarly, when values are expressed as approximations, by using the antecedent about, it must be understood that the particular value forms another realization. It should also be understood that the endpoints of each of the bands are significant both in relation to the other endpoint and independently of the other endpoint. The term about, as used herein, refers to a range that is 15% more or less of a numerical value determined within the context of the particular use. For example, about 10 could include a range of 8.5 to 11.5.
Computer Systems [303] The methods of the present disclosure can be implemented using, or with the aid of, computer systems. Figure 15 shows a 1501 computer system that is programmed or otherwise configured to
Petition 870160049132, of 9/5/2016, p. 143/177
141/159 to implement the methods of the present disclosure. The computer system 1501 can regulate various aspects of sample preparation, sequencing and / or analysis. In some examples, computer system 1501 is configured to perform sample preparation and sample analysis, including nucleic acid sequencing.
[304] The computer system 1501 includes a central processing unit (CPU, also processor and computer processor in this document) 1505, which can be a single-core or multiple-core processor, or a plurality of processors for parallel processing. The computer system 1501 also includes memory or memory location 1510 (for example, random access memory, read-only memory, flash memory), 1515 electronic storage unit (for example, hard drive), 1520 communication interface (for example, example, network adapter) for communication with one or more other 1525 peripheral systems and devices, such as temporary supply storage, other memory, data storage and / or electronic display adapters. Memory 1510, storage unit 1515, interface 1520 and peripheral devices 1525 are in communication with CPU 1505 through a communication bus (solid lines), such as a motherboard. Storage unit 1515 can be a data storage unit (or data repository) for storing data. The computer system 1501 can be operationally coupled to a computer network (network) 1530 with the aid of the communication interface 1520. The network 1530 can be the Internet, a
Petition 870160049132, of 9/5/2016, p. 144/177
142/159 internet and / or an extranet, or an intranet and / or an extranet that is communicating with the internet. The 1530 network in some cases is a telecommunication and / or data network. The 1530 network can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1530, in some cases with the aid of the computer system 1501, can deploy a peer-to-peer network, which can enable devices coupled to the computer system 1501 to behave as a client or as a server.
[305] The 1505 CPU can execute a sequence of machine-readable instructions, which can be incorporated into a program or software. Instructions can be stored in a memory location, such as memory 1510. Examples of operations performed by CPU 1505 may include searching, decoding, executing and rewriting.
[306] The 1515 storage unit can store files, such as saved drives, libraries and programs. The 1515 storage unit can store user-generated programs and recorded sessions, as well as output (s) associated with the programs. The 1515 storage unit can store user data, for example, user preferences and user programs. The computer system 1501 in some cases may include one or more additional data storage units that are external to the computer system 1501, such as located on a remote server that is communicating with the computer system 1501 over an intranet or from Internet.
Petition 870160049132, of 9/5/2016, p. 145/177
143/159 [307] Computer system 1501 can communicate with one or more remote computer systems over the 1530 network. For example, computer system 1501 can communicate with a user's remote computer system ( for example, operator). Examples of remote computer systems include personal computers (for example, portable PC), slate or tablet PCs (for example, Apple® iPad, Samsung® Galaxy Tab), phones, Smartphones (for example, Apple® iPhone, device enabled for Android, Blackberry®), or personal digital assistants. The user can access the 1501 computer system through the 1530 network.
[308] The methods, as described in this document, can be implemented using machine executable code (for example, computer processor) stored in an electronic storage location of the 1501 computer system, such as, for example, in memory 1510 or electronic storage unit 1515. Machine-executable or machine-readable code can be provided in the form of software. During use, the code can be executed by processor 1505. In some cases, the code can be retrieved from storage unit 1515 and stored in memory 1510 for easy access by processor 1505. In some situations, the electronic storage unit 1515 can deleted and machine executable instructions are stored in memory 1510.
[309] The code can be precompiled and configured for use with a machine that has a processor adapted to run the code or can be compiled during
Petition 870160049132, of 9/5/2016, p. 146/177
144/159 operation. The code can be supplied in a programming language that can be selected to enable the code to be executed in a precompiled form or as compiled.
[310] Aspects of the systems and methods provided in this document, such as the 1501 computer system, can be incorporated into programming. Various aspects of technology can be considered as products or articles of manufacture typically in the form of machine executable code (or processor) and / or associated data that are transported or incorporated into a machine-readable type of medium. Machine executable code can be stored in an electronic storage unit, such as memory (for example, read-only memory, random access memory, flash memory) or a hard disk. Storage-type media can include any one of the tangible memory of computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which can provide non-transitory storage to any time for software programming. All or portions of the software can sometimes be communicated via the Internet or several other telecommunication networks. Such communications, for example, may make it possible to load software from one computer or processor to another, for example, from a management server or host computer to the computer platform of an application server. Therefore, another type of media that can carry software elements includes optical waves,
Petition 870160049132, of 9/5/2016, p. 147/177
145/159 electrical and electromagnetic, as used through physical interfaces between local devices, through fixed and wired telephone networks and over several aerial links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, can also be considered as means that carry software. As used herein, unless restricted to tangible, non-transitory storage media, terms such as machine-readable or computer-readable media refer to any medium that participates in providing instructions to a processor for execution.
[311] Therefore, a machine-readable medium, such as computer-executable code, can take many forms, including, but not limited to, a tangible storage medium, a carrier wave medium or a physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices on any computer (s) or the like, as they can be used to deploy databases, etc. shown in the drawings. Volatile storage media include dynamic memory, as does the main memory of such a computer platform. The tangible means of transmission include coaxial cables; copper wire and optical fibers, including wires that comprise a bus within a computer system. The carrier wave transmission means may take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during data communications by
Petition 870160049132, of 9/5/2016, p. 148/177
146/159 infrared (IR) and radio frequency (RF). Common forms of computer-readable media, therefore, include, for example: a floppy disk, a floppy disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium , perforated card paper tape, any other physical storage medium with hole patterns, a RAM, a ROM, a PROM and an EPROM, a FLASH-EPROM, any other chip or memory cartridge, a carrier wave that carries data or instructions, cables or links that carry such a carrier wave, or any other means from which a computer can read programming code and / or data. Many of these forms of computer-readable media can be involved in transporting one or more sequences of one or more instructions to a processor for execution.
[312] The computer system 1501 may include or be in communication with an electronic viewfinder comprising a user interface (UI) to provide, for example,
one or more test results in sample. The examples in UFs include, without limitation, an interface print shop in user (GUI) and user interface based at web.EXAMPLESEXAMPLE 1 - Prognosis and Treatment in Cancer in
Prostate [313] A blood sample is taken from an individual with prostate cancer. In advance, an oncologist determines that the individual has stage II prostate cancer and recommends treatment. Cell-free DNA is extracted, isolated, sequenced and analyzed every 6 months after the initial diagnosis.
Petition 870160049132, of 9/5/2016, p. 149/177
147/159 [314] Cell-free DNA is extracted and isolated from blood using the Qiagen Qubit kit protocol. A carrier DNA is added to increase yields. The DNA is amplified using PCR and universal primers. 10 ng of DNA is sequenced using a massively parallel sequencing approach with an Illumina MiSeq personal sequencer. 90% of the individual's genome is covered by cell-free DNA sequencing.
[315] Sequence data is assembled and analyzed by varying the number of copies. Sequence readings are mapped and compared to a healthy individual (control). Based on the number of sequence readings, the chromosomal regions are divided into 50 kb non-overlapping regions. Sequence readings are compared to each other and a ratio is determined for each mappable position.
[316] A Hidden Markov Model is applied to convert the number of copies into discrete states for each interval.
[317] Reports are generated, mapping genome positions and copy number variation shown in Figure 4A (for a healthy individual) and Figure 4B for the individual with cancer.
[318] These reports, compared to other profiles of individuals with known results, indicate that this particular cancer is aggressive and resistant to treatment. The cell-free tumor load is 21%. The individual is monitored for 18 months. In month 18, the profile of variation in the number of copies begins to increase dramatically, from tumor-free cell load from 21% to 30%. A comparison is
Petition 870160049132, of 9/5/2016, p. 150/177
148/159 made with the genetic profiles of other individuals with prostate cancer. It is determined that this increase in copy number variation indicates that prostate cancer is advancing from stage II to stage III. The original treatment regimen, as prescribed, is no longer treating cancer. A new treatment is prescribed.
[319] Furthermore, these reports are presented and accessed electronically via the internet. Sequence data analysis takes place at a site other than the individual's location. The report is generated and transmitted to the individual's location. Through an internet-enabled computer, the individual accesses reports that reflect his tumor burden (Figure 4C).
EXAMPLE 2 - Prostate Cancer Remission and Recurrence [320] A blood sample is taken from a prostate cancer survivor. The individual has previously undergone numerous cycles of chemotherapy and radiation. The individual at the time of testing had no cancer-related symptoms or healthy tissue. Standard scans and tests have revealed that the individual is cancer free.
[321] Cell-free DNA is extracted and isolated from blood using the Qiagen Qubit kit protocol. A carrier DNA is added to increase yields. DNA is amplified using PCR and universal primers. 10 ng of DNA is sequenced using a massively parallel sequencing approach with an Illumina MiSeq personal sequencer. 12-mer barcodes are added to individual molecules using a bonding method.
Petition 870160049132, of 9/5/2016, p. 151/177
149/159 [322] Sequence data is assembled and analyzed by varying the number of copies. Sequence readings are mapped and compared to a healthy individual (control). Based on the number of sequence readings, the chromosomal regions are divided into 40 kb non-overlapping regions. Sequence readings are compared to each other and a ratio is determined for each mappable position.
[323] Non-unique barcode strings are collected in a single reading to help normalize the propensity for amplification.
[324] A Hidden Markov Model is applied to convert the number of copies into discrete states for each interval.
[325] Reports are generated, mapping genome positions and copy number variation shown in Figure 5A, for an individual with cancer in remission and in Figure 5B for an individual with recurrent cancer.
[326] This report, in comparison with other profiles of individuals with known results, indicates that at month 18, the rare mutation analysis for copy number variation is detected at a 5% cell-free tumor burden. An oncologist prescribes treatment again.
EXAMPLE 3 - Thyroid Cancer and Treatment [327] An individual is known to have Stage IV thyroid cancer and undergo standard treatment, including radiotherapy with 1-131. CT scans are inconclusive as to whether radiation therapy is destroying cancerous masses. Blood is drawn before and after the last radiation session.
Petition 870160049132, of 9/5/2016, p. 152/177
150/159 [328] Cell-free DNA is extracted and isolated from the blood using the Qiagen Qubit kit protocol. An
sample of Bulk DNA not specified is added at reactions from preparation in sample for increase the income.[329] It is known that the gene BRAF may mutate at position of amino acid 600 in this cancer of thyroid. THE
From the cell-free DNA population, BRAF DNA is selectively amplified using specific gene primers. 20 mer barcodes are added to the parent molecule as a control for counting readings.
[330] 10 ng of DNA is sequenced using a massively parallel sequencing approach with an Illumina MiSeq personal sequencer.
[331] Sequence data is assembled and analyzed by detecting variation in the number of copies. Sequence readings are mapped and compared to a healthy individual (control). Based on the number of sequence readings, as determined by counting the barcode sequences, the chromosomal regions are divided into 50 kb non-overlapping regions. Sequence readings are compared to each other and a ratio is determined for each mappable position.
[332] A Hidden Markov Model is applied to convert the number of copies into discrete states for each interval.
[333] A report is generated, mapping genome positions and copy number variation.
[334] Reports generated before and after treatment
Petition 870160049132, of 9/5/2016, p. 153/177
151/159 are compared. The percentage of tumor cell load jumps from 30% to 60% after the radiation session. The salt in the tumor load is determined as an increase in necrosis of cancerous tissue versus normal tissue as a result of treatment. Oncologists recommended that the individual continue the prescribed treatment.
EXAMPLE 4 - Rare Mutation Detection Sensitivity [335] In order to determine the rare mutation detection ranges present in a DNA population, mixing experiments are performed. DNA sequences, some containing wild-type copies of the TP53, HRAS and MET genes and some containing copies with rare mutations in the same genes, are mixed together for different reasons. DNA mixtures are prepared so that the ratios or percentages between mutant DNA and wild-type DNA are in the range of 100% to 0.01%.
[336] 10 ng of DNA is sequenced for each mixing experiment using a massively parallel sequencing approach with an Illumina MiSeq personal sequencer.
[337] Sequence data are assembled and analyzed for detection of rare mutations. Sequence readings are mapped and compared to a reference (control) sequence. Based on the number of sequence readings, the frequency of variance for each mappable position is determined.
[338] A Hidden Markov Model is applied to convert the frequency of variance for each mappable position into discrete states for the base position.
[339] A report is generated, mapping base positions
Petition 870160049132, of 9/5/2016, p. 154/177
152/159 genome and percentage detection of the rare mutation per baseline as determined by the reference sequence (Figure 6A).
[340] The results of various mixing experiments in the range of 0.1% to 100% are plotted on a logarithmic scale graph, with percentage of DNA measured with a rare mutation plotted as a function of the actual percentage of DNA with a rare mutation (Figure 6B). The three genes, TP53, HRAS and MET, are represented. A strong linear correlation is found between the measured and expected rare mutation populations. In addition, a lower sensitivity limit of about 0.1 DNA with a rare mutation in a population of DNA without mutation is found with these experiments (Figure 6B).
EXAMPLE 5 - Detection of Rare Mutations in Individuals with Prostate Cancer [341] An individual is believed to have early stage prostate cancer. Other clinical tests provide inconclusive results. Blood is taken from the individual and cell-free DNA is extracted, isolated, prepared and sequenced.
[342] A panel of various oncogenes and tumor suppressor genes is selected for selective amplification using a TaqMan © PCR kit (Invitrogen) using specific primers. The amplified DNA regions include DNA containing PIK3CA and TP53 genes.
[343] 10 ng of DNA is sequenced using a massively parallel sequencing approach with an Illumina MiSeq personal sequencer.
[344] Sequence data is assembled and analyzed
Petition 870160049132, of 9/5/2016, p. 155/177
153/159 for detection of rare mutation. Sequence readings are mapped and compared to a reference (control) sequence. Based on the number of sequence readings, the frequency of variance for each mappable position was determined.
[345] A Hidden Markov Model is applied to convert the frequency of variance for each mappable position into discrete states for the base position.
[346] A report is generated, mapping genomic base positions and percentage detection of the rare mutation by baseline as determined by the reference sequence (Figure 7A). Rare mutations are found in an incidence of 5% in two genes, PIK3CA and TP53, respectively, indicating that the individual has an early stage cancer. The treatment is started.
[347] Furthermore, these reports are presented and accessed electronically via the internet. Sequence data analysis takes place at a site beyond the individual's location. The report is generated and transmitted to the individual's location. Through an internet-enabled computer, the individual accesses reports that reflect his tumor burden (Figure 7B).
EXAMPLE 6 - Detection of Rare Mutation in Individuals with Colorectal Cancer [348] An individual is believed to have intermediate-stage colorectal cancer. Other clinical tests provide inconclusive results. Blood is taken from the individual and cell-free DNA is extracted.
[349] 10 ng of the cell-free genetic material that is extracted from a single plasma tube is used. The material
Petition 870160049132, of 9/5/2016, p. 156/177
154/159 initial genetic is converted into a set of labeled parent polynucleotides. The tagging included attaching tags required for sequencing as well as non-unique identifiers to trace progeny molecules to the parent nucleic acids. The conversion is carried out through an optimized binding reaction as described above and the conversion yield is confirmed by observation in the molecule size profile after ligation. The conversion yield is measured as the percentage of starting starting molecules that have both ends linked with tags. Conversion using this approach is carried out at high efficiency, for example, at least 50%.
[350] The tagged library is amplified by PCR and enriched by genes primarily associated with colorectal cancer, (for example, KRAS, APC, TP53, etc.) and the resulting DNA is sequenced using a massively parallel sequencing approach with a personal Illumina MiSeq sequencer.
[351] Sequence data is assembled and analyzed for rare mutation detection. Sequence readings are collected in family groups that belong to a parent molecule (as well as collection by corrected error) and mapped using a reference (control) sequence. Based on the number of sequence readings, the frequency of rare variations (substitutions, insertions, deletions, etc.) and variations in the number of copies and heterozygosity (where appropriate) for each mappable position is determined.
[352] A report is generated, mapping base positions
Petition 870160049132, of 9/5/2016, p. 157/177
155/159 genomics and percentage detection of the rare mutation per baseline as determined by the reference sequence. Rare mutations are found in an incidence of 0.3 to 0.4% in two genes, KRAS and FBXW7, respectively, indicating that the individual has residual cancer. Treatment is started.
[353] Furthermore, these reports are presented and accessed electronically via the internet. Sequence data analysis takes place at a site other than the individual's location. The report is generated and transmitted to the individual's location. Through an internet-enabled computer, the individual accesses reports that reflect his tumor burden.
EXAMPLE 7 - Digital Sequencing Technology [354] Concentrations of tumor-dispersed nucleic acids are typically so low that current next-generation sequencing technologies can only detect such signals sporadically or in patients with terminally high tumor burden. The main reason is that such technologies are plagued by propensities and error rates that may be orders of magnitude higher than is necessary to reliably detect cancer-associated genetic changes in circulating DNA. Shown here is a new sequencing methodology, Digital Sequencing Technology (DST), which increases the sensitivity and specificity of detection and quantification of nucleic acids derived from a rare tumor among germline fragments by at least 1 to 2 orders of magnitude.
[355] The STD architecture is inspired by systems of
Petition 870160049132, of 9/5/2016, p. 158/177
156/159 state-of-the-art digital communications that combat high noise and distortion caused by modern communication channels and are capable of seamlessly transmitting digital information at excessively high data rates. Similarly, today's next generation workflows are plagued by extremely high noise and distortion (due to sample preparation, PCR-based amplification and sequencing). Digital sequencing is able to eliminate the error and distortion created by these processes and produce an almost perfect representation of all rare variants (including CNVs).
High Diversity Library Preparation [356] Unlike conventional sequencing library preparation protocols, whereby most of the extracted DNA fragments in circulation are lost due to inefficient library conversion, the current Sequencing Technology workflow Digital enables the vast majority of starting molecules to be converted and sequenced. This is essentially important for detecting rare variations as there may only be a few molecules somatically mutated in a 10 ml whole tube of blood. The efficient molecular biology conversion process developed allows the highest possible sensitivity for detecting rare variations.
Comprehensive Actionable Oncogene Panel [357] The workflow projected around the DST platform is flexible and highly adjustable since the targeted regions can be as small as single exons or as wide as whole exomes (or even genomes)
Petition 870160049132, of 9/5/2016, p. 159/177
157/159 integers). A standard panel consists of all exonic bases of 15 actionable cancer-related genes and coverage of the hot exons of 36 additional tumor suppressor / oncossupressor genes (for example, exons containing at least one or more somatic mutations reported in COSMIC).
EXAMPLE 8: Analytical Studies [358] To study the performance of the present technology, its sensitivity in analytical samples was assessed. Variable amounts of LNCaP cancer cell line DNA were inserted into a cfDNA background and it was possible to successfully detect somatic mutations up to 0.1% sensitivity (see Figure 13).
Pre-Clinical Studies [359] The concordance of circulating DNA with tumor gDNA in mouse human xenograft models was investigated. In seven CTC negative mice, each with one of two different human breast cancer tumors, all somatic mutations in the tumor gDNA were also detected in the blood cfDNA of the mice with the use of STDs, further validating the usefulness of cfDNA for non-invasive definition of tumor genetic profile.
Pilot Clinical Studies
Correlation of tumor vs. biopsy. circulating somatic DNA mutations [360] A pilot study was started on human samples
through in many different types of cancer. The concordance in profiles in mutation of tumor derivatives free DNA in cells in circulation with those derived from samples in biopsy in corresponding tumor was investigated. Was
Petition 870160049132, of 9/5/2016, p. 160/177
158/159 a higher agreement of 93% was found between somatic cfDNA mutation profiles and tumor in both colorectal and melanoma cancers, by 14 patients (Table 1).
TABLE 1
Patient ID Internship Mutant genes in corresponding tumor and Percentage of mutant cfDNA CRC N- 1 II-B TP53 0.2% CRC N-2 II-C KRAS 0.6% SMAD4 1.5% GNAS 1.4% FBXW7 0.8% CRC N-3 III — B KRAS 1.1% TP53 1.4% PIK3CA 1.7% APC 0.7% CRC N-4 III — B KRAS 0.3% TP53 0.4% CRC N-5 III — B KRAS 0.04% CRC N- 6 III-C KRAS 0.03% CRC N-7 IV PIK3CA 1.3% KRAS 0.6% TP53 0.8% CRC N-8 IV APC 0.3% SMO 0.6% TP53 0.4% KRAS 0.0% CRC N-9 IV APC 47.3% APC 40.2% KRAS 37.7% PTEN 0.0% TP53 12, 9% CRC N-10 IV TP53 0.9%
Petition 870160049132, of 9/5/2016, p. 161/177
159/159
Melanoma1 IV BRAF 0.2% Melanoma N- 2 IV APC 0.3% EGFR 0.9% MYC 10.5% Melanoma N- 3 IV BRAF 3, 3% Melanoma N- 4 IV BRAF 0.7%
[361] It should be understood from the above that, although particular implementations have been illustrated and described, several modifications can be made to them and are contemplated in this document. Nor is it intended that the invention be limited by the specific examples provided within the specification. Although the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the preferred embodiments in this document should not be interpreted in a limiting sense. In addition, it should be understood that all aspects of the invention are not limited to the specific representations, configurations or other relative proportions presented in this document that depend on a variety of conditions and variables. Various changes in the form and details of the modalities of the invention will be noticed by a person skilled in the art. It is, therefore, contemplated that the invention should also cover any such modifications, variations and equivalents.
权利要求:
Claims (11)
[1]
1. Method for detecting variation in the number of copies CHARACTERIZED by the fact that it comprises:
The. sequencing extracellular polynucleotides from a body sample of an individual, where each of the extracellular polynucleotides generates a plurality of sequencing readings;
B. filter readings that fail to meet a defined threshold;
ç. map the sequence readings obtained from step (a), after the readings have been filtered, by a reference sequence;
d. quantify or enumerate the mapped readings in two or more predefined regions of the reference sequence; and
and. determine the variation in the number of copies in one or more of the predefined regions:
i. normalizing the number of readings in the regions predefined among themselves and / or the number of exclusive sequence readings in the regions predefined among themselves;
ii. comparing the normalized numbers obtained in step (i) to the normalized numbers obtained from a control sample.
[2]
2. Method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual, CHARACTERIZED by the fact that it comprises:
The. sequencing extracellular polynucleotides from a body sample of an individual, where each of the extracellular polynucleotides generates a plurality of sequencing readings;
Petition 870160049132, of 9/5/2016, p. 163/177
11/11
B. perform multiplex sequencing on regions or entire genome sequencing if enrichment is not performed;
ç. filter readings that fail to meet a defined threshold;
d. map sequence readings derived from sequencing into a reference sequence;
and. identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position;
f. for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position;
g. normalize the ratios or frequency of variance for each base mappable position and determine mutation (s) or potential rare variant (s); and
H. compare the resulting number for each of the regions with potential mutation (s) or rare variant (s) to numbers similarly derived from a reference sample.
[3]
3. Method to characterize the heterogeneity of an abnormal condition in an individual, the method CHARACTERIZED by the fact that it comprises generating a genetic profile of extracellular polynucleotides in the individual, in which the genetic profile comprises a plurality of data resulting from the analysis of rare and variation in the number of copies.
[4]
4. Method, according to any of the
Petition 870160049132, of 9/5/2016, p. 164/177
3/11
claims 1 to 3, CHARACTERIZED by the fact that the
prevalence / concentration of each rare variant identified in the individual is reported and quantified simultaneously.
5. Method, according to any of the
claims 1 to 3, CHARACTERIZED by the fact that a
score of confidence in relation to
prevalence / concentration of rare variations in the individual is reported.
6. Method, according to any of the claims 1 to 3, CHARACTERIZED by the fact that the
extracellular polynucleotides comprise DNA.
7. Method, according to any of the claims 1 to 3, CHARACTERIZED by the fact that also comprises , isolate extracellular polynucleotides
of the body sample.
8. Method, according to any of the claims 1 to 3, CHARACTERIZED by the fact that also comprises the step of determining the percentage of
sequences that have variation in number of copies or variant or rare mutation in said body sample.
9. Method, according to claim 8,
CHARACTERIZED by the fact that the determination comprises calculating the percentage of predefined regions with an amount of polynucleotides above or below a predetermined limit.
10. Method, according to any of the claims 1 to 3, CHARACTERIZED by the fact that the
individual is suspected of having an abnormal condition.
11. Method, according to any of the claims 1 to 3, CHARACTERIZED by the fact that the
Petition 870160049132, of 9/5/2016, p. 165/177
4/11 individual is a pregnant woman.
12. Method, according to claim 1 or 2, CHARACTERIZED by the fact that the variation in the number of copies or rare mutation or genetic variant is indicative of a fetal anomaly.
13. Method, according to any one of claims 1 to 3, CHARACTERIZED by the fact that it also comprises fixing one or more bar codes to extracellular polynucleotides or fragments thereof before sequencing.
14. Method, according to claim 13, CHARACTERIZED by the fact that each barcode attached to the extracellular polynucleotides or fragments thereof before sequencing is not exclusive.
15. Method, according with any an of claims 1 to 3, CHARACTERIZED by the fact that what also comprises enrich selectively regions of genome or transcriptome of the individual before of sequencing. 16. Method, according with any an of
claims 1 to 3, CHARACTERIZED by the fact that it also comprises, non-selectively enriching the regions of the individual's genome or transcriptome before sequencing.
17. Method according to any one of claims 1 to 3, CHARACTERIZED by the fact that it also comprises fixing one or more bar codes to extracellular polynucleotides or fragments thereof before any stage of amplification or enrichment.
18. Method, according to claim 13,
Petition 870160049132, of 9/5/2016, p. 166/177
[5]
5/11
CHARACTERIZED by the fact that the barcode comprises a fixed or semi-random set of oligonucleotides that, in combination with the diversity of molecules sequenced from a selected region, makes it possible to identify unique molecules.
19. Method, according to any one of claims 1 to 3, CHARACTERIZED by the fact that it also comprises amplifying extracellular polynucleotides or fragments thereof.
20. Method according to any one of claims 1 to 3, CHARACTERIZED by the fact that the unique identity sequence readings are detected based on the sequence information in the start (start) and end (stop) regions of the reading of
sequence and in length of reading of sequence. 21. Method, according to Any of them of claims 1 to 3, CHARACTERIZED by the fact that at samples in breaks of time following of the same individual are analyzed and compared to results in
previous sample.
22. Method according to any one of claims 1 to 3, CHARACTERIZED by the fact that the genetic profile of a tumor, an infection or other tissue anomaly is inferred.
23. Method for detecting a rare mutation in a cell-free or substantially cell-free sample obtained from an individual CHARACTERIZED by the fact that it comprises:
The. sequence extracellular polynucleotides from a body sample from an individual, each of which
Petition 870160049132, of 9/5/2016, p. 167/177
[6]
6/11 extracellular polynucleotides generates a plurality of sequencing readings;
B. filter readings that fail to meet a defined quality limit;
ç. map sequence readings from sequencing to a reference sequence;
d. identify a subset of mapped sequence readings that align with a variant of the reference sequence in each mappable base position;
and. for each mappable base position, calculate a ratio between (a) a number of mapped sequence readings that include a variant compared to the reference sequence and (b) a number of total sequence readings for each mappable base position;
f. normalize the ratios or frequency of variance for each mappable base position and determine potential rare variant (s) or other genetic change (s); and
g. compare the resulting number for each of the regions with potential mutation (s) or rare variant (s) to numbers similarly derived from a reference sample.
24. Method, CHARACTERIZED by the fact that it comprises:
provide an initial starting genetic material
obtained from a body sample obtained from a individual; convert polynucleotides filament the double of one material starting genetic initial on at least one set polynucleotides parents tagged in no way i exclusive, where each polynucleotide in one set is mappable for a sequence of reference; and
Petition 870160049132, of 9/5/2016, p. 168/177
[7]
7/11 for each set of labeled parent polynucleotides:
i. amplifying the labeled parent polynucleotides in a set to produce a corresponding set of labeled parent polynucleotides;
ii. sequencing a set of amplified parent polynucleotides to produce a set of sequencing readings;
iii. collapse the set of sequencing readings to generate a consensus set of sequences, in which collapsing uses sequence information from a tag and at least one from: sequence information in a region of the beginning of the sequence reading, an end region of the reading of the sequence sequence and length of the sequence reading; each consensus sequence corresponds to a unique polynucleotide among the set of labeled parent polynucleotides; and iv. analyze the set of consensus sequences for each set of labeled parent molecules separately or in combination.
25. Method, according to claim 24, CHARACTERIZED by the fact that the initial genetic material comprises no more than 100 ng of polynucleotides.
26. Method, according to claim 24, CHARACTERIZED by the fact that it comprises restricting the initial starting genetic material before conversion.
27. Method, according to claim 24, CHARACTERIZED by the fact that converting comprises
Petition 870160049132, of 9/5/2016, p. 169/177
[8]
8/11 any one of blunt end bond, cohesive end bond, molecular inversion probes, PCR, bond based PCR, single filament bond and single filament circularization.
28. Method according to claim 24, CHARACTERIZED by the fact that the starting genetic material is cell-free nucleic acid.
29. Method according to claim 79, CHARACTERIZED by the fact that a plurality of at least one set of non-exclusively labeled parent polynucleotides maps different mappable positions in a reference sequence of the same genome.
30. Method according to claim 24, CHARACTERIZED by the fact that the set of amplified progeny polynucleotides is of sufficient size so that any nucleotide sequence represented in a set of labeled parent polynucleotides in a percentage that is the same as the rate of sequencing error by percentage basis of the used sequencing platform, has at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, at least 99, 9% or at least 99.99% chance of being represented among the set of consensus strings.
31. Method, according to claim 24, CHARACTERIZED by the fact that it comprises enriching the set of amplified progeny polynucleotides for polynucleotide mapping by one or more mappable positions selected in a reference sequence by:
Petition 870160049132, of 9/5/2016, p. 170/177
[9]
9/11 (i) selective amplification of sequences of initial genetic material converted into labeled parent polynucleotides; (ii) selective amplification of labeled parent polynucleotides; (iii) selective sequence capture of amplified progeny polynucleotides; or (iv) capture of selective sequence of initial starting genetic material.
32. Method according to claim 24, CHARACTERIZED by the fact that a given subset of polynucleotides is selected for, or is enriched based on, length of base pair polynucleotides from the initial set of polynucleotides or from the labeled amplified polynucleotides.
33. The method of claim 24,
CHARACTERIZED by the fact that it also comprises providing a plurality of sets of labeled parent polynucleotides, in which each set is mappable to a different mappable position in the reference sequence.
34. Method according to claim 33, CHARACTERIZED by the fact that the mappable position in the reference sequence is the locus of a tumor marker and the analysis comprises detecting the tumor marker in the consensus sequence set.
35. Method, according to claim 34, CHARACTERIZED by the fact that the tumor marker is present in a set of consensus sequences at a frequency lower than the error rate introduced in the amplification step.
36. The method of claim 33,
Petition 870160049132, of 9/5/2016, p. 171/177
[10]
11/10
CHARACTERIZED by the fact that the mappable position of the reference sequence comprises a plurality of mappable positions in the reference sequence, where each mappable position is the locus of a tumor marker.
37. Method according to claim 33, CHARACTERIZED by the fact that analysis comprises detecting variation in the number of copy of consensus sequences between at least two sets of parent polynucleotides.
38. Method according to claim 33, CHARACTERIZED by the fact that analyzing comprises detecting the presence of sequence variations compared to the reference sequences and detecting the variation in the number of copies of the consensus sequence between at least two sets of parent polynucleotides.
39. Method, according to claim 24, CHARACTERIZED by the fact that collapsing comprises:
i. group sequenced readings from progeny polynucleotides amplified into families, where each family is amplified from the same labeled parent polynucleotide; and ii. determine a consensus sequence based on sequence readings in a family.
40. Method, CHARACTERIZED by the fact that it comprises:
The. provide at least one set of labeled parent polynucleotides, each set mapping to a different mappable position in a reference sequence in one or more genomes and for each set of labeled parent polynucleotides;
i. amplify the first polynucleotides to produce a set of amplified polynucleotides;
Petition 870160049132, of 9/5/2016, p. 172/177
[11]
11/11 ii. sequencing a subset of the amplified polynucleotide set to produce a set of sequencing readings; and iii. collapse the sequence readings by:
1. group sequenced readings from progeny polynucleotides amplified into families, where each family is amplified from the same labeled parent polynucleotide.
41. Method, according to claim 40, CHARACTERIZED by the fact that collapsing further comprises:
2. determine a quantitative measure of sequence readings in each family.
42. Method, in wake up with claim 41, CHARACTERIZED BY fact that further comprises: B. to determine an measure quantity of readings in sequence in each family; andç. based (D in the quantitative measure of families exclusive and (2) at measure quantity of readings in sequence in each group, infer a measure in polynucleotides parents exclusive labeled at the set. 43. Method, in wake up with claim 40,
CHARACTERIZED by the fact that at least a subset of parent polynucleotides tagged in each set are non-exclusively tagged.
44. Method according to claim 40, CHARACTERIZED by the fact that polynucleotides are derived from cell-free polynucleotides, exosomal polynucleotides, bacterial polynucleotides or viral polynucleotides.
类似技术:
公开号 | 公开日 | 专利标题
US10876172B2|2020-12-29|Systems and methods to detect rare mutations and copy number variation
US10894974B2|2021-01-19|Systems and methods to detect rare mutations and copy number variation
US20200325529A1|2020-10-15|Systems and methods to detect rare mutations and copy number variation
同族专利:
公开号 | 公开日
US20210139998A1|2021-05-13|
DE202013012824U1|2020-03-10|
KR102210852B1|2021-02-01|
IL237480A|2019-10-31|
MX2015002769A|2015-08-14|
US20210032707A1|2021-02-04|
IL237480D0|2015-04-30|
ES2711635T3|2019-05-06|
JP2020000237A|2020-01-09|
EP2893040B1|2019-01-02|
US20200248270A1|2020-08-06|
EP3470533B1|2019-11-06|
JP2020103298A|2020-07-09|
US10876171B2|2020-12-29|
US20200291487A1|2020-09-17|
CN110872617A|2020-03-10|
US20210130912A1|2021-05-06|
US20180223374A1|2018-08-09|
DK2893040T5|2019-03-18|
US10501810B2|2019-12-10|
EP3591073B1|2021-12-01|
US20150299812A1|2015-10-22|
US20210340632A1|2021-11-04|
EP2893040A4|2016-04-27|
IL269097D0|2019-11-28|
SG11201501662TA|2015-05-28|
GB201509071D0|2015-07-08|
PT2893040T|2019-04-01|
US20200087735A1|2020-03-19|
EP3470533A1|2019-04-17|
US10947600B2|2021-03-16|
EP2893040A1|2015-07-15|
US10457995B2|2019-10-29|
US10738364B2|2020-08-11|
US10041127B2|2018-08-07|
HK1201080A1|2015-08-21|
WO2014039556A1|2014-03-13|
JP2015535681A|2015-12-17|
JP2018027096A|2018-02-22|
US20170218460A1|2017-08-03|
US20200299785A1|2020-09-24|
CN104781421B|2020-06-05|
US20190185940A1|2019-06-20|
US20200087736A1|2020-03-19|
US9598731B2|2017-03-21|
US9840743B2|2017-12-12|
DK2893040T3|2019-03-11|
KR102028375B1|2019-10-04|
SG10202000486VA|2020-03-30|
HK1212396A1|2016-06-10|
US10793916B2|2020-10-06|
US20190177803A1|2019-06-13|
US20220042104A1|2022-02-10|
ES2769241T3|2020-06-25|
US10501808B2|2019-12-10|
US20190185941A1|2019-06-20|
GB2533006B|2017-06-07|
CN104781421A|2015-07-15|
MX367963B|2019-09-11|
US20180171415A1|2018-06-21|
JP6664025B2|2020-03-13|
US10822663B2|2020-11-03|
US10995376B1|2021-05-04|
US10837063B2|2020-11-17|
JP6275145B2|2018-02-07|
PL2893040T3|2019-05-31|
US10683556B2|2020-06-16|
US20170218459A1|2017-08-03|
US20190177802A1|2019-06-13|
US20150368708A1|2015-12-24|
US11001899B1|2021-05-11|
EP3842551A1|2021-06-30|
HK1225416B|2017-09-08|
KR20190112843A|2019-10-07|
US20180327862A1|2018-11-15|
GB2533006A|2016-06-08|
US9834822B2|2017-12-05|
US10494678B2|2019-12-03|
KR20150067161A|2015-06-17|
US10961592B2|2021-03-30|
EP3591073A1|2020-01-08|
KR20210013317A|2021-02-03|
CA2883901A1|2014-03-13|
US10876172B2|2020-12-29|
US20210355549A1|2021-11-18|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US604804A|1898-05-31|Shuttle for looms |
US4725536A|1985-09-19|1988-02-16|Genetics Institute, Inc.|Reagent polynucleotide complex with multiple target binding regions, and kit and methods|
US6150517A|1986-11-24|2000-11-21|Gen-Probe|Methods for making oligonucleotide probes for the detection and/or quantitation of non-viral organisms|
US4942124A|1987-08-11|1990-07-17|President And Fellows Of Harvard College|Multiplex sequencing|
US5149625A|1987-08-11|1992-09-22|President And Fellows Of Harvard College|Multiplex analysis of DNA|
US5124246A|1987-10-15|1992-06-23|Chiron Corporation|Nucleic acid multimers and amplified nucleic acid hybridization assays using same|
US5656731A|1987-10-15|1997-08-12|Chiron Corporation|Nucleic acid-amplified immunoassay probes|
US5424186A|1989-06-07|1995-06-13|Affymax Technologies N.V.|Very large scale immobilized polymer synthesis|
US5871928A|1989-06-07|1999-02-16|Fodor; Stephen P. A.|Methods for nucleic acid analysis|
US5800992A|1989-06-07|1998-09-01|Fodor; Stephen P.A.|Method of detecting nucleic acids|
US5143854A|1989-06-07|1992-09-01|Affymax Technologies N.V.|Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof|
US6551784B2|1989-06-07|2003-04-22|Affymetrix Inc|Method of comparing nucleic acid sequences|
US5744101A|1989-06-07|1998-04-28|Affymax Technologies N.V.|Photolabile nucleoside protecting groups|
US6309822B1|1989-06-07|2001-10-30|Affymetrix, Inc.|Method for comparing copy number of nucleic acid sequences|
US6582908B2|1990-12-06|2003-06-24|Affymetrix, Inc.|Oligonucleotides|
US5925525A|1989-06-07|1999-07-20|Affymetrix, Inc.|Method of identifying nucleotide differences|
US5200314A|1990-03-23|1993-04-06|Chiron Corporation|Polynucleotide capture assay employing in vitro amplification|
WO1992010588A1|1990-12-06|1992-06-25|Affymax Technologies N.V.|Sequencing by hybridization of a target nucleic acid to a matrix of defined oligonucleotides|
US5981179A|1991-11-14|1999-11-09|Digene Diagnostics, Inc.|Continuous amplification reaction|
US5424413A|1992-01-22|1995-06-13|Gen-Probe Incorporated|Branched nucleic acid probes|
US5573905A|1992-03-30|1996-11-12|The Scripps Research Institute|Encoded combinatorial chemical libraries|
US6020124A|1992-04-27|2000-02-01|Trustees Of Dartmouth College|Detection of soluble gene sequences in biological fluids|
US5981176A|1992-06-17|1999-11-09|City Of Hope|Method of detecting and discriminating between nucleic acid sequences|
AU7212494A|1993-06-25|1995-01-17|Affymax Technologies N.V.|Hybridization and sequencing of nucleic acids|
US5500356A|1993-08-10|1996-03-19|Life Technologies, Inc.|Method of nucleic acid sequence selection|
US6309823B1|1993-10-26|2001-10-30|Affymetrix, Inc.|Arrays of nucleic acid probes for analyzing biotransformation genes and methods of using the same|
US5681697A|1993-12-08|1997-10-28|Chiron Corporation|Solution phase nucleic acid sandwich assays having reduced background noise and kits therefor|
CH686982A5|1993-12-16|1996-08-15|Maurice Stroun|Method for diagnosis of cancers.|
US20030017081A1|1994-02-10|2003-01-23|Affymetrix, Inc.|Method and apparatus for imaging a sample on a device|
US5714330A|1994-04-04|1998-02-03|Lynx Therapeutics, Inc.|DNA sequencing by stepwise ligation and cleavage|
US6013445A|1996-06-06|2000-01-11|Lynx Therapeutics, Inc.|Massively parallel signature sequencing by ligation of encoded adaptors|
US5604097A|1994-10-13|1997-02-18|Spectragen, Inc.|Methods for sorting polynucleotides using oligonucleotide tags|
US5695934A|1994-10-13|1997-12-09|Lynx Therapeutics, Inc.|Massively parallel sequencing of sorted polynucleotides|
US5846719A|1994-10-13|1998-12-08|Lynx Therapeutics, Inc.|Oligonucleotide tags for sorting and identification|
US6600996B2|1994-10-21|2003-07-29|Affymetrix, Inc.|Computer-aided techniques for analyzing biological sequences|
EP0709466B1|1994-10-28|2006-09-27|Gen-Probe Incorporated|Compositions and methods for the simultaneous detection and quantification of multiple specific nucleic acid sequences|
US5648245A|1995-05-09|1997-07-15|Carnegie Institution Of Washington|Method for constructing an oligonucleotide concatamer library by rolling circle replication|
US5968740A|1995-07-24|1999-10-19|Affymetrix, Inc.|Method of Identifying a Base in a Nucleic Acid|
GB9516636D0|1995-08-14|1995-10-18|Univ London|In-situ nucleic acid amplification and detection|
US6040138A|1995-09-15|2000-03-21|Affymetrix, Inc.|Expression monitoring by hybridization to high density oligonucleotide arrays|
US5763175A|1995-11-17|1998-06-09|Lynx Therapeutics, Inc.|Simultaneous sequencing of tagged polynucleotides|
US5854033A|1995-11-21|1998-12-29|Yale University|Rolling circle replication reporter systems|
EP0929694A4|1996-03-15|2002-05-02|Penn State Res Found|Detection of extracellular tumor-associated nucleic acid in blood plasma or serum using nucleic acid amplification assays|
DE69739909D1|1996-03-26|2010-07-29|Michael S Kopreski|METHODS USED IN PLASMA OR SERUM EXTRACTED EXTRACELLURAE RNA FOR DIAGNOSIS MONITORING OR EVALUATION OF CANCER|
US6458530B1|1996-04-04|2002-10-01|Affymetrix Inc.|Selecting tag nucleic acids|
US6300077B1|1996-08-14|2001-10-09|Exact Sciences Corporation|Methods for the detection of nucleic acids|
WO1998015644A2|1996-09-27|1998-04-16|The Chinese University Of Hong Kong|Parallel polynucleotide sequencing method|
US6124092A|1996-10-04|2000-09-26|The Perkin-Elmer Corporation|Multiplex polynucleotide capture methods and compositions|
US6117631A|1996-10-29|2000-09-12|Polyprobe, Inc.|Detection of antigens via oligonucleotide antibody conjugates|
US6046005A|1997-01-15|2000-04-04|Incyte Pharmaceuticals, Inc.|Nucleic acid sequencing with solid phase capturable terminators comprising a cleavable linking group|
EP0985142A4|1997-05-23|2006-09-13|Lynx Therapeutics Inc|System and apparaus for sequential processing of analytes|
WO1999028505A1|1997-12-03|1999-06-10|Curagen Corporation|Methods and devices for measuring differential gene expression|
WO2000012687A1|1998-08-28|2000-03-09|Invitrogen Corporation|System for the rapid manipulation of nucleic acid sequences|
US6653077B1|1998-09-04|2003-11-25|Lynx Therapeutics, Inc.|Method of screening for genetic polymorphism|
US6503718B2|1999-01-10|2003-01-07|Exact Sciences Corporation|Methods for detecting mutations using primer extension for detecting disease|
AU2308900A|1999-02-05|2000-08-25|Amersham Pharmacia Biotech Uk Limited|Analysis method|
US6629040B1|1999-03-19|2003-09-30|University Of Washington|Isotope distribution encoded tags for protein identification|
JP2002539849A|1999-03-26|2002-11-26|ホワイトヘッドインスチチュートフォアーバイオメディカルリサーチ|Universal array|
US6964846B1|1999-04-09|2005-11-15|Exact Sciences Corporation|Methods for detecting nucleic acids indicative of cancer|
US6355431B1|1999-04-20|2002-03-12|Illumina, Inc.|Detection of nucleic acid amplification reactions using bead arrays|
US6699661B1|1999-04-20|2004-03-02|Kankyo Engineering Co., Ltd.|Method for determining a concentration of target nucleic acid molecules, nucleic acid probes for the method, and method for analyzing data obtained by the method|
US20030207300A1|2000-04-28|2003-11-06|Matray Tracy J.|Multiplex analytical platform using molecular tags|
US6242186B1|1999-06-01|2001-06-05|Oy Jurilab Ltd.|Method for detecting a risk of cancer and coronary heart disease and kit therefor|
US6326148B1|1999-07-12|2001-12-04|The Regents Of The University Of California|Detection of copy number changes in colon cancer|
US6440706B1|1999-08-02|2002-08-27|Johns Hopkins University|Digital amplification|
US6586177B1|1999-09-08|2003-07-01|Exact Sciences Corporation|Methods for disease detection|
US6849403B1|1999-09-08|2005-02-01|Exact Sciences Corporation|Apparatus and method for drug screening|
JP2003516138A|1999-12-07|2003-05-13|エグザクトサイエンシーズコーポレイション|Detection of aerobic digestive neoplasms in the colon|
US6489114B2|1999-12-17|2002-12-03|Bio Merieux|Process for labeling a ribonucleic acid, and labeled RNA fragments which are obtained thereby|
AT411397T|2000-02-07|2008-10-15|Illumina Inc|NUCLEIC ACID PROOF METHOD WITH UNIVERSAL PRIMING|
GB2364054B|2000-03-24|2002-05-29|Smithkline Beecham Corp|Method of amplifying quinolone-resistance-determining-regions and identifying polymorphic variants thereof|
EP1158055A1|2000-05-26|2001-11-28|Xu Qi University of Teaxs Laboratoire de Leucémie Chen|Method for diagnosing cancers|
JP4287652B2|2000-10-24|2009-07-01|ザ・ボード・オブ・トラスティーズ・オブ・ザ・レランド・スタンフォード・ジュニア・ユニバーシティ|Characterization of genomic DNA by direct multiple processing|
US20020142345A1|2000-12-22|2002-10-03|Nelsen Anita J.|Methods for encoding and decoding complex mixtures in arrayed assays|
US20030049616A1|2001-01-08|2003-03-13|Sydney Brenner|Enzymatic synthesis of oligonucleotide tags|
US6849404B2|2001-05-07|2005-02-01|Bioneer Corporation|Polymerase chain reaction of DNA of which base sequence is completely unidentified|
US7406385B2|2001-10-25|2008-07-29|Applera Corporation|System and method for consensus-calling with per-base quality values for sample assemblies|
US7727720B2|2002-05-08|2010-06-01|Ravgen, Inc.|Methods for detection of genetic disorders|
DE60207979T2|2002-03-05|2006-09-28|Epigenomics Ag|Method and device for determining tissue specificity of free DNA in body fluids|
US20030186251A1|2002-04-01|2003-10-02|Brookhaven Science Associates, Llc|Genome sequence tags|
EP1578994A2|2002-11-11|2005-09-28|Affymetrix, Inc.|Methods for identifying dna copy number changes|
US10229244B2|2002-11-11|2019-03-12|Affymetrix, Inc.|Methods for identifying DNA copy number changes using hidden markov model based estimations|
US7822555B2|2002-11-11|2010-10-26|Affymetrix, Inc.|Methods for identifying DNA copy number changes|
US7704687B2|2002-11-15|2010-04-27|The Johns Hopkins University|Digital karyotyping|
EP1606417A2|2003-03-07|2005-12-21|Rubicon Genomics Inc.|In vitro dna immortalization and whole genome amplification using libraries generated from randomly fragmented dna|
EP2371453A1|2005-03-18|2011-10-05|Fluidigm Corporation|Microfluidic device|
US20040259118A1|2003-06-23|2004-12-23|Macevicz Stephen C.|Methods and compositions for nucleic acid sequence analysis|
CA2531105C|2003-07-05|2015-03-17|The Johns Hopkins University|Method and compositions for detection and enumeration of genetic variations|
EP1524321B2|2003-10-16|2014-07-23|Sequenom, Inc.|Non-invasive detection of fetal genetic traits|
DE10348407A1|2003-10-17|2005-05-19|Widschwendter, Martin, Prof.|Prognostic and diagnostic markers for cell proliferative disorders of breast tissues|
US20070111233A1|2003-10-30|2007-05-17|Bianchi Diana W|Prenatal diagnosis using cell-free fetal DNA in amniotic fluid|
CA2552858A1|2004-01-23|2005-08-04|Lingvitae As|Improving polynucleotide ligation reactions|
US7393665B2|2005-02-10|2008-07-01|Population Genetics Technologies Ltd|Methods and compositions for tagging and identifying polynucleotides|
AU2005214329A1|2004-02-12|2005-09-01|Population Genetics Technologies Ltd|Genetic analysis by sequence-specific sorting|
US20100216153A1|2004-02-27|2010-08-26|Helicos Biosciences Corporation|Methods for detecting fetal nucleic acids and diagnosing fetal abnormalities|
US20060046258A1|2004-02-27|2006-03-02|Lapidus Stanley N|Applications of single molecule sequencing|
WO2005111242A2|2004-05-10|2005-11-24|Parallele Bioscience, Inc.|Digital profiling of polynucleotide populations|
US7276720B2|2004-07-19|2007-10-02|Helicos Biosciences Corporation|Apparatus and methods for analyzing samples|
US20060035258A1|2004-08-06|2006-02-16|Affymetrix, Inc.|Methods for identifying DNA copy number changes|
US7937225B2|2004-09-03|2011-05-03|New York University|Systems, methods and software arrangements for detection of genome copy number variation|
EP1647600A3|2004-09-17|2006-06-28|Affymetrix, Inc. |Methods for identifying biological samples by addition of nucleic acid bar-code tags|
WO2006047787A2|2004-10-27|2006-05-04|Exact Sciences Corporation|Method for monitoring disease progression or recurrence|
US7424371B2|2004-12-21|2008-09-09|Helicos Biosciences Corporation|Nucleic acid analysis|
ITRM20050068A1|2005-02-17|2006-08-18|Istituto Naz Per Le Malattie I|METHOD FOR THE DETECTION OF NUCLEIC ACIDS OF BACTERIAL OR PATENT PATOGEN AGENTS IN URINE.|
WO2006099604A2|2005-03-16|2006-09-21|Compass Genetics, Llc|Methods and compositions for assay readouts on multiple analytical platforms|
ES2313143T3|2005-04-06|2009-03-01|Maurice Stroun|METHOD FOR THE CANCER DIAGNOSIS THROUGH CIRCULATING DNA AND RNA DETECTION.|
US20070020640A1|2005-07-21|2007-01-25|Mccloskey Megan L|Molecular encoding of nucleic acid templates for PCR and other forms of sequence analysis|
US7666593B2|2005-08-26|2010-02-23|Helicos Biosciences Corporation|Single molecule sequencing of captured nucleic acids|
EP1929039B2|2005-09-29|2013-11-20|Keygene N.V.|High throughput screening of mutagenized populations|
WO2007087312A2|2006-01-23|2007-08-02|Population Genetics Technologies Ltd.|Molecular counting|
US20070172839A1|2006-01-24|2007-07-26|Smith Douglas R|Asymmetrical adapters and methods of use thereof|
US8383338B2|2006-04-24|2013-02-26|Roche Nimblegen, Inc.|Methods and systems for uniform enrichment of genomic regions|
US7702468B2|2006-05-03|2010-04-20|Population Diagnostics, Inc.|Evaluating genetic disorders|
KR101356305B1|2006-05-18|2014-01-28|몰큘라 프로파일링 인스티튜트, 아이엔씨|System and method for determining individualized medical intervention for a disease state|
US20080124721A1|2006-06-14|2008-05-29|Martin Fuchs|Analysis of rare cell-enriched samples|
FR2904833A1|2006-08-11|2008-02-15|Bioquanta Sarl|Determining the quantity of nucleic acid, particularly DNA or RNA in a sample comprises adding a fluorophore to the sample, measuring fluorescence intensities in response to luminous stimulations and removing the nucleic acids|
US8603749B2|2006-11-15|2013-12-10|Biospherex, LLC|Multitag sequencing ecogenomics analysis-US|
WO2008070144A2|2006-12-06|2008-06-12|Duke University|Imprinted genes and disease|
CA2676244C|2007-01-25|2017-01-17|Kwok-Kin Wong|Use of anti-egfr antibodies in treatment of egfr mutant mediated disease|
ES2635051T3|2007-03-13|2017-10-02|Amgen Inc.|Mutations in K-ras and anti-EGFR antibody therapy|
WO2008148072A2|2007-05-24|2008-12-04|The Brigham And Women's Hospital, Inc.|Disease-associated genetic variations and methods for obtaining and using same|
CA2689356A1|2007-06-01|2008-12-11|454 Life Sciences Corporation|System and meth0d for identification of individual samples from a multiplex mixture|
AU2008261935B2|2007-06-06|2013-05-02|Pacific Biosciences Of California, Inc.|Methods and processes for calling bases in sequence by incorporation methods|
US20100112590A1|2007-07-23|2010-05-06|The Chinese University Of Hong Kong|Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment|
EA017966B1|2007-07-23|2013-04-30|Те Чайниз Юниверсити Ов Гонгконг|Diagnosing fetal chromosomal aneuploidy using genomic sequencing|
US20090053719A1|2007-08-03|2009-02-26|The Chinese University Of Hong Kong|Analysis of nucleic acids by digital pcr|
AU2008295992B2|2007-09-07|2014-04-17|Fluidigm Corporation|Copy number variation determination, methods and systems|
EP2229587B1|2007-11-21|2016-08-03|Cosmosid Inc.|Genome identification system|
US9524369B2|2009-06-15|2016-12-20|Complete Genomics, Inc.|Processing and analysis of complex nucleic acid sequence data|
CN101999003A|2008-02-12|2011-03-30|诺瓦蒂斯公司|Method for isolating cell free apoptotic or fetal nucleic acids|
US8216789B2|2008-02-27|2012-07-10|University Of Washington|Diagnostic panel of cancer antibodies and methods for use|
WO2009120808A2|2008-03-26|2009-10-01|Sequenom, Inc.|Restriction endonuclease enhanced polymorphic sequence detection|
US8153375B2|2008-03-28|2012-04-10|Pacific Biosciences Of California, Inc.|Compositions and methods for nucleic acid sequencing|
US20110160290A1|2008-05-21|2011-06-30|Muneesh Tewari|Use of extracellular rna to measure disease|
DE102008025656B4|2008-05-28|2016-07-28|Genxpro Gmbh|Method for the quantitative analysis of nucleic acids, markers therefor and their use|
US20090298709A1|2008-05-28|2009-12-03|Affymetrix, Inc.|Assays for determining telomere length and repeated sequence copy number|
US20100041048A1|2008-07-31|2010-02-18|The Johns Hopkins University|Circulating Mutant DNA to Assess Tumor Dynamics|
US20100062494A1|2008-08-08|2010-03-11|President And Fellows Of Harvard College|Enzymatic oligonucleotide pre-adenylation|
US20100069250A1|2008-08-16|2010-03-18|The Board Of Trustees Of The Leland Stanford Junior University|Digital PCR Calibration for High Throughput Sequencing|
EP3216874A1|2008-09-05|2017-09-13|TOMA Biosciences, Inc.|Methods for stratifying and annotating cancer drug treatment options|
US8383345B2|2008-09-12|2013-02-26|University Of Washington|Sequence tag directed subassembly of short sequencing reads into long sequencing reads|
SG10201500567VA|2008-09-20|2015-04-29|Univ Leland Stanford Junior|Noninvasive diagnosis of fetal aneuploidy by sequencing|
EP2859123A4|2012-06-11|2015-12-16|Sequenta Inc|Method of sequence determination using sequence tags|
EP2379748A4|2008-12-23|2012-08-29|Illumina Inc|Multibase delivery for long reads in sequencing by synthesis protocols|
US20100323348A1|2009-01-31|2010-12-23|The Regents Of The University Of Colorado, A Body Corporate|Methods and Compositions for Using Error-Detecting and/or Error-Correcting Barcodes in Nucleic Acid Amplification Process|
US20120165202A1|2009-04-30|2012-06-28|Good Start Genetics, Inc.|Methods and compositions for evaluating genetic markers|
WO2010127186A1|2009-04-30|2010-11-04|Prognosys Biosciences, Inc.|Nucleic acid constructs and methods of use|
EP2446052B1|2009-06-25|2018-08-08|Fred Hutchinson Cancer Research Center|Method of measuring adaptive immunity|
US20190010543A1|2010-05-18|2019-01-10|Natera, Inc.|Methods for simultaneous amplification of target loci|
US20120220478A1|2009-07-20|2012-08-30|Bar Harbor Biotechnology, Inc.|Methods for assessing disease risk|
EP2494065B1|2009-10-26|2015-12-23|Lifecodexx AG|Means and methods for non-invasive diagnosis of chromosomal aneuploidy|
CN102597272A|2009-11-12|2012-07-18|艾索特里克斯遗传实验室有限责任公司|Copy number analysis of genetic locus|
US9023769B2|2009-11-30|2015-05-05|Complete Genomics, Inc.|cDNA library for nucleic acid sequencing|
US9752187B2|2009-12-11|2017-09-05|Nucleix|Categorization of DNA samples|
US8835358B2|2009-12-15|2014-09-16|Cellular Research, Inc.|Digital counting of individual molecules by stochastic attachment of diverse labels|
US9315857B2|2009-12-15|2016-04-19|Cellular Research, Inc.|Digital counting of individual molecules by stochastic attachment of diverse label-tags|
US9926593B2|2009-12-22|2018-03-27|Sequenom, Inc.|Processes and kits for identifying aneuploidy|
US10388403B2|2010-01-19|2019-08-20|Verinata Health, Inc.|Analyzing copy number variation in the detection of cancer|
US20110177512A1|2010-01-19|2011-07-21|Predictive Biosciences, Inc.|Method for assuring amplification of an abnormal nucleic acid in a sample|
WO2011090556A1|2010-01-19|2011-07-28|Verinata Health, Inc.|Methods for determining fraction of fetal nucleic acid in maternal samples|
US9411937B2|2011-04-15|2016-08-09|Verinata Health, Inc.|Detecting and classifying copy number variation|
EP2875149B1|2012-07-20|2019-12-04|Verinata Health, Inc.|Detecting and classifying copy number variation in a cancer genome|
ES2534986T3|2010-01-19|2015-05-04|Verinata Health, Inc|Simultaneous determination of aneuploidy and fetal fraction|
EP2513341B1|2010-01-19|2017-04-12|Verinata Health, Inc|Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing|
US9260745B2|2010-01-19|2016-02-16|Verinata Health, Inc.|Detecting and classifying copy number variation|
US20120100548A1|2010-10-26|2012-04-26|Verinata Health, Inc.|Method for determining copy number variations|
EP2536854B1|2010-02-18|2017-07-19|The Johns Hopkins University|Personalized tumor biomarkers|
US9140689B2|2010-03-14|2015-09-22|Translational Genomics Research Institute|Methods of determining susceptibility of tumors to tyrosine kinase inhibitors|
CN101967517B|2010-03-19|2012-11-07|黄乐群|Polymerase chain reaction -free gene detection method|
EP2558854B1|2010-04-16|2018-10-10|Chronix Biomedical|Breast cancer associated circulating nucleic acid biomarkers|
US9255291B2|2010-05-06|2016-02-09|Bioo Scientific Corporation|Oligonucleotide ligation methods for improving data quality and throughput using massively parallel sequencing|
CA2824387C|2011-02-09|2019-09-24|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
RU2620959C2|2010-12-22|2017-05-30|Натера, Инк.|Methods of noninvasive prenatal paternity determination|
EP2576837B1|2010-06-04|2017-09-06|Chronix Biomedical|Prostate cancer associated circulating nucleic acid biomarkers|
CN102933721B|2010-06-09|2015-12-02|凯津公司|For the composite sequence barcode of high flux screening|
EP2400035A1|2010-06-28|2011-12-28|Technische Universität München|Methods and compositions for diagnosing gastrointestinal stromal tumors|
EP2591433A4|2010-07-06|2017-05-17|Life Technologies Corporation|Systems and methods to detect copy number variation|
JP2013538565A|2010-07-23|2013-10-17|プレジデントアンドフェロウズオブハーバードカレッジ|Methods for detecting disease or symptom signatures in body fluids|
WO2012014877A1|2010-07-29|2012-02-02|Toto株式会社|Photocatalyst coated body and photocatalyst coating liquid|
DK2601609T3|2010-08-02|2017-06-06|Population Bio Inc|COMPOSITIONS AND METHODS FOR DISCOVERING MUTATIONS CAUSING GENETIC DISORDERS|
US11031095B2|2010-08-06|2021-06-08|Ariosa Diagnostics, Inc.|Assay systems for determination of fetal copy number variation|
US20120034603A1|2010-08-06|2012-02-09|Tandem Diagnostics, Inc.|Ligation-based detection of genetic variants|
EP2426217A1|2010-09-03|2012-03-07|Centre National de la Recherche Scientifique |Analytical methods for cell free nucleic acids and applications|
EP3211421A1|2010-09-09|2017-08-30|Traxxsson, LLC|Combination methods of diagnosing cancer in a patient|
CA2811185C|2010-09-21|2020-09-22|Population Genetics Technologies Ltd.|Increasing confidence of allele calls with molecular counting|
WO2012042374A2|2010-10-01|2012-04-05|Anssi Jussi Nikolai Taipale|Method of determining number or concentration of molecules|
EP3561159A1|2010-10-08|2019-10-30|President and Fellows of Harvard College|High-throughput single cell barcoding|
US8725422B2|2010-10-13|2014-05-13|Complete Genomics, Inc.|Methods for estimating genome-wide copy number variations|
US9404156B2|2010-10-22|2016-08-02|Cold Spring Harbor Laboratory|Varietal counting of nucleic acids for obtaining genomic copy number information|
WO2012066451A1|2010-11-15|2012-05-24|Pfizer Inc.|Prognostic and predictive gene signature for colon cancer|
CN103403182B|2010-11-30|2015-11-25|香港中文大学|The heredity relevant to cancer or the detection of molecular distortion|
US9163281B2|2010-12-23|2015-10-20|Good Start Genetics, Inc.|Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction|
WO2012088348A2|2010-12-23|2012-06-28|Sequenom, Inc.|Fetal genetic variation detection|
SG191818A1|2010-12-30|2013-08-30|Foundation Medicine Inc|Optimization of multigene analysis of tumor samples|
US20140011694A1|2011-01-11|2014-01-09|Via Genomes, Inc.|Methods, systems, databases, kits and arrays for screening for and predicting the risk of an identifying the presence of tumors and cancers|
US20120190021A1|2011-01-25|2012-07-26|Aria Diagnostics, Inc.|Detection of genetic abnormalities|
US20140024539A1|2011-02-02|2014-01-23|Translational Genomics Research Institute|Biomarkers and methods of use thereof|
US20120238464A1|2011-03-18|2012-09-20|Baylor Research Institute|Biomarkers for Predicting the Recurrence of Colorectal Cancer Metastasis|
WO2012129363A2|2011-03-24|2012-09-27|President And Fellows Of Harvard College|Single cell nucleic acid detection and analysis|
US20150065358A1|2011-03-30|2015-03-05|Verinata Health, Inc.|Method for verifying bioassay samples|
PL2697397T3|2011-04-15|2017-08-31|The Johns Hopkins University|Safe sequencing system|
US9347059B2|2011-04-25|2016-05-24|Bio-Rad Laboratories, Inc.|Methods and compositions for nucleic acid analysis|
JP6669430B2|2011-05-06|2020-03-18|ニユー・イングランド・バイオレイブス・インコーポレイテツド|Promote ligation|
SG194745A1|2011-05-20|2013-12-30|Fluidigm Corp|Nucleic acid encoding reactions|
US9752176B2|2011-06-15|2017-09-05|Ginkgo Bioworks, Inc.|Methods for preparative in vitro cloning|
WO2013019075A2|2011-08-01|2013-02-07|연세대학교산학협력단|Method of preparing nucleic acid molecules|
US10704164B2|2011-08-31|2020-07-07|Life Technologies Corporation|Methods, systems, computer readable media, and kits for sample identification|
US9834766B2|2011-09-02|2017-12-05|Atreca, Inc.|DNA barcodes for multiplexed sequencing|
US8712697B2|2011-09-07|2014-04-29|Ariosa Diagnostics, Inc.|Determination of copy number variations using binomial probability calculations|
US20130079241A1|2011-09-15|2013-03-28|Jianhua Luo|Methods for Diagnosing Prostate Cancer and Predicting Prostate Cancer Relapse|
US10196681B2|2011-10-06|2019-02-05|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US9367663B2|2011-10-06|2016-06-14|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10424394B2|2011-10-06|2019-09-24|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US20140242588A1|2011-10-06|2014-08-28|Sequenom, Inc|Methods and processes for non-invasive assessment of genetic variations|
US20130102485A1|2011-10-19|2013-04-25|Inhan Lee|Method of Determining a Diseased State in a Subject|
NO3051026T3|2011-10-21|2018-07-28|
EP2768985B1|2011-10-21|2019-03-20|Chronix Biomedical|Colorectal cancer associated circulating nucleic acid biomarkers|
US20130122499A1|2011-11-14|2013-05-16|Viomics, Inc.|System and method of detecting local copy number variation in dna samples|
US20130143747A1|2011-12-05|2013-06-06|Myriad Genetics, Incorporated|Methods of detecting cancer|
WO2013086352A1|2011-12-07|2013-06-13|Chronix Biomedical|Prostate cancer associated circulating nucleic acid biomarkers|
CN104094120B|2011-12-08|2017-04-26|凡弗3基因组有限公司|Mdm2-containing double minute chromosomes and methods therefore|
US20130184165A1|2012-01-13|2013-07-18|Data2Bio|Genotyping by next-generation sequencing|
ES2665071T3|2012-02-17|2018-04-24|Fred Hutchinson Cancer Research Center|Compositions and methods to identify mutations accurately|
WO2013130512A2|2012-02-27|2013-09-06|The University Of North Carolina At Chapel Hill|Methods and uses for molecular tags|
EP2820158B1|2012-02-27|2018-01-10|Cellular Research, Inc.|Compositions and kits for molecular counting|
US9670529B2|2012-02-28|2017-06-06|Population Genetics Technologies Ltd.|Method for attaching a counter sequence to a nucleic acid sample|
WO2013130791A1|2012-02-29|2013-09-06|Dana-Farber Cancer Institute, Inc.|Compositions, kits, and methods for the identification, assessment, prevention, and therapy of cancer|
US9892230B2|2012-03-08|2018-02-13|The Chinese University Of Hong Kong|Size-based analysis of fetal or tumor DNA fraction in plasma|
WO2013138510A1|2012-03-13|2013-09-19|Patel Abhijit Ajit|Measurement of nucleic acid variants using highly-multiplexed error-suppressed deep sequencing|
HUE051845T2|2012-03-20|2021-03-29|Univ Washington Through Its Center For Commercialization|Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing|
WO2013142213A1|2012-03-20|2013-09-26|Wake Forest University Health Sciences|Methods, systems, and computer readable media for tracking and verifying receipt of contents of a delivery within an organization|
US10053729B2|2012-03-26|2018-08-21|The Johns Hopkins University|Rapid aneuploidy detection|
US8209130B1|2012-04-04|2012-06-26|Good Start Genetics, Inc.|Sequence assembly|
AU2013249012B2|2012-04-19|2019-03-28|The Medical College Of Wisconsin, Inc.|Highly sensitive surveillance using detection of cell free DNA|
CA2873585C|2012-05-14|2021-11-09|Cb Biotechnologies, Inc.|Method for increasing accuracy in quantitative detection of polynucleotides|
AU2013267609C1|2012-05-31|2019-01-03|Board Of Regents, The University Of Texas System|Method for accurate sequencing of DNA|
US11261494B2|2012-06-21|2022-03-01|The Chinese University Of Hong Kong|Method of measuring a fractional concentration of tumor DNA|
WO2014004726A1|2012-06-26|2014-01-03|Caifu Chen|Methods, compositions and kits for the diagnosis, prognosis and monitoring of cancer|
US20160040229A1|2013-08-16|2016-02-11|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|
GB2528205B|2013-03-15|2020-06-03|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|
US20140066317A1|2012-09-04|2014-03-06|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|
PT2893040T|2012-09-04|2019-04-01|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|
EP3561072A1|2012-12-10|2019-10-30|Resolution Bioscience, Inc.|Methods for targeted genomic analysis|
WO2014107548A1|2013-01-05|2014-07-10|Foundation Medicine, Inc.|System and method for outcome tracking and analysis|
WO2014113729A2|2013-01-18|2014-07-24|Foundation Mecicine, Inc.|Methods of treating cholangiocarcinoma|
WO2014152990A1|2013-03-14|2014-09-25|University Of Rochester|System and method for detecting population variation from nucleic acid sequencing data|
CN105358709B|2013-03-15|2018-12-07|雅培分子公司|System and method for detecting genome copy numbers variation|
US10119134B2|2013-03-15|2018-11-06|Abvitro Llc|Single cell bar-coding for antibody discovery|
AU2014233373B2|2013-03-15|2019-10-24|Verinata Health, Inc.|Generating cell-free DNA libraries directly from blood|
EP3421613B1|2013-03-15|2020-08-19|The Board of Trustees of the Leland Stanford Junior University|Identification and use of circulating nucleic acid tumor markers|
SG10201707548RA|2013-03-19|2017-10-30|Toppan Printing Co Ltd|Method for predicting sensitivity to egfr inhibitor|
SG11201508985VA|2013-05-23|2015-12-30|Univ Leland Stanford Junior|Transposition into native chromatin for personal epigenomics|
JP2015096049A|2013-11-15|2015-05-21|凸版印刷株式会社|Method for predicting long-term success of vegf inhibitor|
AU2014369841B2|2013-12-28|2019-01-24|Guardant Health, Inc.|Methods and systems for detecting genetic variants|
WO2015159293A2|2014-04-14|2015-10-22|Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd.|A method and kit for determining the tissue or cell origin of dna|
EP3805404A1|2014-05-13|2021-04-14|Board of Regents, The University of Texas System|Gene mutations and copy number alterations of egfr, kras and met|
JP2017522908A|2014-07-25|2017-08-17|ユニヴァーシティ オブ ワシントン|Method for determining tissue and / or cell type producing cell-free DNA, and method for identifying disease or abnormality using the same|
AU2015292020B2|2014-07-25|2018-07-05|Bgi Genomics Co., Ltd.|Method and device for determining a ratio of free nucleic acids in a biological sample and use thereof|
US20160053301A1|2014-08-22|2016-02-25|Clearfork Bioscience, Inc.|Methods for quantitative genetic analysis of cell free dna|
EP3192047A4|2014-09-10|2018-03-28|Pathway Genomics Corporation|Health and wellness management methods and systems useful for the practice thereof|
EP3191628A4|2014-09-12|2018-05-02|The Board of Trustees of the Leland Stanford Junior University|Identification and use of circulating nucleic acids|
KR20170125044A|2015-02-10|2017-11-13|더 차이니즈 유니버시티 오브 홍콩|Mutation detection for cancer screening and fetal analysis|
US10844428B2|2015-04-28|2020-11-24|Illumina, Inc.|Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices |
US20170211140A1|2015-12-08|2017-07-27|Twinstrand Biosciences, Inc.|Adapters, methods, and compositions for duplex sequencing|
US20190085406A1|2016-04-14|2019-03-21|Guardant Health, Inc.|Methods for early detection of cancer|US10081839B2|2005-07-29|2018-09-25|Natera, Inc|System and method for cleaning noisy genetic data and determining chromosome copy number|
US11111544B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US11111543B2|2005-07-29|2021-09-07|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US10083273B2|2005-07-29|2018-09-25|Natera, Inc.|System and method for cleaning noisy genetic data and determining chromosome copy number|
US9424392B2|2005-11-26|2016-08-23|Natera, Inc.|System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals|
US20190010543A1|2010-05-18|2019-01-10|Natera, Inc.|Methods for simultaneous amplification of target loci|
US10316362B2|2010-05-18|2019-06-11|Natera, Inc.|Methods for simultaneous amplification of target loci|
EP2473638B1|2009-09-30|2017-08-09|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
RU2620959C2|2010-12-22|2017-05-30|Натера, Инк.|Methods of noninvasive prenatal paternity determination|
CA2824387C|2011-02-09|2019-09-24|Natera, Inc.|Methods for non-invasive prenatal ploidy calling|
WO2012129363A2|2011-03-24|2012-09-27|President And Fellows Of Harvard College|Single cell nucleic acid detection and analysis|
PL2697397T3|2011-04-15|2017-08-31|The Johns Hopkins University|Safe sequencing system|
WO2012177792A2|2011-06-24|2012-12-27|Sequenom, Inc.|Methods and processes for non-invasive assessment of a genetic variation|
US20130079241A1|2011-09-15|2013-03-28|Jianhua Luo|Methods for Diagnosing Prostate Cancer and Predicting Prostate Cancer Relapse|
US20140242588A1|2011-10-06|2014-08-28|Sequenom, Inc|Methods and processes for non-invasive assessment of genetic variations|
US9984198B2|2011-10-06|2018-05-29|Sequenom, Inc.|Reducing sequence read count error in assessment of complex genetic variations|
US9367663B2|2011-10-06|2016-06-14|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10424394B2|2011-10-06|2019-09-24|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10196681B2|2011-10-06|2019-02-05|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US9892230B2|2012-03-08|2018-02-13|The Chinese University Of Hong Kong|Size-based analysis of fetal or tumor DNA fraction in plasma|
HUE051845T2|2012-03-20|2021-03-29|Univ Washington Through Its Center For Commercialization|Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing|
CN104428425A|2012-05-04|2015-03-18|考利达基因组股份有限公司|Methods for determining absolute genome-wide copy number variations of complex tumors|
US9920361B2|2012-05-21|2018-03-20|Sequenom, Inc.|Methods and compositions for analyzing nucleic acid|
US11261494B2|2012-06-21|2022-03-01|The Chinese University Of Hong Kong|Method of measuring a fractional concentration of tumor DNA|
US10497461B2|2012-06-22|2019-12-03|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US10876152B2|2012-09-04|2020-12-29|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|
US20160040229A1|2013-08-16|2016-02-11|Guardant Health, Inc.|Systems and methods to detect rare mutations and copy number variation|
PT2893040T|2012-09-04|2019-04-01|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|
GB2528205B|2013-03-15|2020-06-03|Guardant Health Inc|Systems and methods to detect rare mutations and copy number variation|
US10482994B2|2012-10-04|2019-11-19|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US9218450B2|2012-11-29|2015-12-22|Roche Molecular Systems, Inc.|Accurate and fast mapping of reads to genome|
US10504613B2|2012-12-20|2019-12-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
US20130309666A1|2013-01-25|2013-11-21|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
EP2971130A4|2013-03-15|2016-10-05|Nugen Technologies Inc|Sequential sequencing|
EP2981921A1|2013-04-03|2016-02-10|Sequenom, Inc.|Methods and processes for non-invasive assessment of genetic variations|
CN112575075A|2013-05-24|2021-03-30|塞昆纳姆股份有限公司|Methods and processes for non-invasive assessment of genetic variation|
DK3011051T3|2013-06-21|2019-04-23|Sequenom Inc|Method for non-invasive evaluation of genetic variations|
US10577655B2|2013-09-27|2020-03-03|Natera, Inc.|Cell free DNA diagnostic testing standards|
JP6525434B2|2013-10-04|2019-06-05|セクエノム, インコーポレイテッド|Methods and processes for non-invasive assessment of gene mutations|
JP6534191B2|2013-10-21|2019-06-26|ベリナタ ヘルス インコーポレイテッド|Method for improving the sensitivity of detection in determining copy number variation|
KR20210008941A|2013-11-07|2021-01-25|더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티|Cell-free nucleic acids for the analysis of the human microbiome and components thereof|
JP6525473B2|2013-11-13|2019-06-05|ニューゲン テクノロジーズ, インコーポレイテッド|Compositions and methods for identifying replicate sequencing leads|
AU2014369841B2|2013-12-28|2019-01-24|Guardant Health, Inc.|Methods and systems for detecting genetic variants|
WO2016011428A1|2014-07-17|2016-01-21|University Of Pittsburgh - Of The Commonwealth System Of Higher Education|Methods of treating cells containing fusion genes|
EP3090062B1|2013-12-30|2020-08-26|University of Pittsburgh - of the Commonwealth System of Higher Education|Fusion genes associated with progressive prostate cancer|
US9677118B2|2014-04-21|2017-06-13|Natera, Inc.|Methods for simultaneous amplification of target loci|
US10262755B2|2014-04-21|2019-04-16|Natera, Inc.|Detecting cancer mutations and aneuploidy in chromosomal segments|
US10179937B2|2014-04-21|2019-01-15|Natera, Inc.|Detecting mutations and ploidy in chromosomal segments|
EP3805404A1|2014-05-13|2021-04-14|Board of Regents, The University of Texas System|Gene mutations and copy number alterations of egfr, kras and met|
WO2015181718A1|2014-05-26|2015-12-03|Ebios Futura S.R.L.|Method of prenatal diagnosis|
WO2015183872A1|2014-05-30|2015-12-03|Sequenom, Inc.|Chromosome representation determinations|
WO2015184404A1|2014-05-30|2015-12-03|Verinata Health, Inc.|Detecting fetal sub-chromosomal aneuploidies and copy number variations|
ES2890136T3|2014-07-18|2022-01-17|Univ Hong Kong Chinese|Analysis of tissue methylation patterns in a DNA mixture|
GB201412834D0|2014-07-18|2014-09-03|Cancer Rec Tech Ltd|A method for detecting a genetic variant|
WO2016022833A1|2014-08-06|2016-02-11|Nugen Technologies, Inc.|Digital measurements from targeted sequencing|
CN107075564A|2014-12-10|2017-08-18|深圳华大基因研究院|The method and apparatus for determining tumour nucleic acid concentration|
MA40939A|2014-12-12|2017-10-18|Verinata Health Inc|USING THE SIZE OF ACELLULAR DNA FRAGMENTS TO DETERMINE VARIATIONS IN THE NUMBER OF COPIES|
WO2016095093A1|2014-12-15|2016-06-23|天津华大基因科技有限公司|Method for screening tumor, method and device for detecting variation of target region|
US9618474B2|2014-12-18|2017-04-11|Edico Genome, Inc.|Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids|
US10006910B2|2014-12-18|2018-06-26|Agilome, Inc.|Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same|
CA2971589C|2014-12-18|2021-09-28|Edico Genome Corporation|Chemically-sensitive field effect transistor|
US9859394B2|2014-12-18|2018-01-02|Agilome, Inc.|Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids|
US9857328B2|2014-12-18|2018-01-02|Agilome, Inc.|Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same|
US10020300B2|2014-12-18|2018-07-10|Agilome, Inc.|Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids|
EP3289502A4|2014-12-29|2018-09-12|Counsyl, Inc.|Method for determining genotypes in regions of high homology|
CN107406876B|2014-12-31|2021-09-07|夸登特健康公司|Detection and treatment of diseases exhibiting pathological cell heterogeneity and systems and methods for communicating test results|
US10364467B2|2015-01-13|2019-07-30|The Chinese University Of Hong Kong|Using size and number aberrations in plasma DNA for detecting cancer|
EP3967775A1|2015-07-23|2022-03-16|The Chinese University Of Hong Kong|Analysis of fragmentation patterns of cell-free dna|
KR20170125044A|2015-02-10|2017-11-13|더 차이니즈 유니버시티 오브 홍콩|Mutation detection for cancer screening and fetal analysis|
JP6995625B2|2015-05-01|2022-01-14|ガーダント ヘルス, インコーポレイテッド|Diagnostic method|
CA2986036A1|2015-05-18|2016-11-24|Karius, Inc.|Compositions and methods for enriching populations of nucleic acids|
CN107849600A|2015-06-09|2018-03-27|生命技术公司|For the method for molecular labeling, system, composition, kit, device and computer-readable media|
SG11201707649SA|2015-06-24|2017-10-30|Samsung Life Public Welfare Foundation|Method and device for analyzing gene|
CN107922973B|2015-07-07|2019-06-14|远见基因组系统公司|Method and system for the modification detection based on sequencing|
EP3322816B1|2015-07-13|2020-01-01|Agilent Technologies Belgium NV|System and methodology for the analysis of genomic data obtained from a subject|
WO2017015513A1|2015-07-21|2017-01-26|Guardant Health, Inc.|Locked nucleic acids for capturing fusion genes|
US10465245B2|2015-07-29|2019-11-05|Progenity, Inc.|Nucleic acids and methods for detecting chromosomal abnormalities|
EP3329014A2|2015-07-29|2018-06-06|Progenity, Inc.|Systems and methods for genetic analysis|
EP3332037B1|2015-08-07|2021-02-24|University of Pittsburgh- Of the Commonwealth System of Higher Education|Methods for predicting prostate cancer relapse|
AU2016321204A1|2015-09-08|2018-04-12|Cold Spring Harbor Laboratory|Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides|
KR20180068985A|2015-10-30|2018-06-22|이그젝트 싸이언스 디블롭먼트 컴패니, 엘엘씨|Detection of complex amplification and separation and detection of DNA from plasma|
EP3377655A4|2015-11-16|2018-11-21|Mayo Foundation for Medical Education and Research|Detecting copy number variations|
JP2019507585A|2015-12-17|2019-03-22|ガーダント ヘルス, インコーポレイテッド|Method for determining oncogene copy number by analysis of cell free DNA|
WO2017127741A1|2016-01-22|2017-07-27|Grail, Inc.|Methods and systems for high fidelity sequencing|
CN109072309A|2016-02-02|2018-12-21|夸登特健康公司|Cancer evolution detection and diagnosis|
US10095831B2|2016-02-03|2018-10-09|Verinata Health, Inc.|Using cell-free DNA fragment size to determine copy number variations|
EP3430170A4|2016-03-16|2019-11-27|Dana-Farber Cancer Institute, Inc.|Methods for genome characterization|
US9976181B2|2016-03-25|2018-05-22|Karius, Inc.|Synthetic nucleic acid spike-ins|
ITUA20162640A1|2016-04-15|2017-10-15|Menarini Silicon Biosystems Spa|METHOD AND KIT FOR THE GENERATION OF DNA LIBRARIES FOR PARALLEL MAXIMUM SEQUENCING|
WO2017201081A1|2016-05-16|2017-11-23|Agilome, Inc.|Graphene fet devices, systems, and methods of using the same for sequencing nucleic acids|
WO2017212428A1|2016-06-07|2017-12-14|The Regents Of The University Of California|Cell-free dna methylation patterns for disease and condition analysis|
EP3831958A1|2016-06-30|2021-06-09|Grail, Inc.|Differential tagging of rna for preparation of a cell-free dna/rna sequencing library|
CN107577917A|2016-07-05|2018-01-12|魏霖静|A kind of bioinformatics high performance information management system and data processing method|
US11200963B2|2016-07-27|2021-12-14|Sequenom, Inc.|Genetic copy number alteration classifications|
US9850523B1|2016-09-30|2017-12-26|Guardant Health, Inc.|Methods for multi-resolution analysis of cell-free nucleic acids|
CN109642250A|2016-09-30|2019-04-16|夸登特健康公司|The method of multiresolution analysis for cell-free nucleic acid|
WO2018071595A1|2016-10-12|2018-04-19|Bellwether Bio, Inc.|Determining cell type origin of circulating cell-free dna with molecular counting|
AU2018212272A1|2017-01-25|2019-07-18|Grail, Inc.|Diagnostic applications using nucleic acid fragments|
WO2018081465A1|2016-10-26|2018-05-03|Pathway Genomics Corporation|Systems and methods for characterizing nucleic acid in a biological sample|
CN106566877A|2016-10-31|2017-04-19|天津诺禾致源生物信息科技有限公司|Gene mutation detection method and apparatus|
US10011870B2|2016-12-07|2018-07-03|Natera, Inc.|Compositions and methods for identifying nucleic acid molecules|
US20180166170A1|2016-12-12|2018-06-14|Konstantinos Theofilatos|Generalized computational framework and system for integrative prediction of biomarkers|
BR112019012958A2|2016-12-22|2019-11-26|Guardant Health Inc|methods and systems for nucleic acid molecule analysis|
CN106701956A|2017-01-11|2017-05-24|上海思路迪生物医学科技有限公司|Technology for digitized deep sequencing of ctDNA|
WO2018156418A1|2017-02-21|2018-08-30|Natera, Inc.|Compositions, methods, and kits for isolating nucleic acids|
CN106755547A|2017-03-15|2017-05-31|上海亿康医学检验所有限公司|The Non-invasive detection and its recurrence monitoring method of a kind of carcinoma of urinary bladder|
EP3610034A4|2017-04-12|2020-12-16|Karius, Inc.|Sample preparation methods, systems and compositions|
US20200080158A1|2017-05-15|2020-03-12|Katholieke Universiteit Leuven|Method for analysing cell-free nucleic acids|
WO2018213235A1|2017-05-16|2018-11-22|Life Technologies Corporation|Methods for compression of molecular tagged nucleic acid sequence data|
WO2018213498A1|2017-05-16|2018-11-22|Guardant Health, Inc.|Identification of somatic or germline origin for cell-free dna|
KR102145417B1|2017-05-24|2020-08-19|지니너스 주식회사|Method for generating distribution of background allele frequency for sequencing data obtained from cell-free nucleic acid and method for detecting mutation from cell-free nucleic acid using the same|
EP3635133A4|2017-06-09|2021-03-03|Bellwether Bio, Inc.|Determination of cancer type in a subject by probabilistic modeling of circulating nucleic acid fragment endpoints|
EP3431611A1|2017-07-21|2019-01-23|Menarini Silicon Biosystems S.p.A.|Improved method and kit for the generation of dna libraries for massively parallel sequencing|
KR20200035427A|2017-07-26|2020-04-03|더 차이니즈 유니버시티 오브 홍콩|Augmentation of cancer screening using cell-free viral nucleic acids|
AU2018335405A1|2017-09-20|2020-04-09|Guardant Health, Inc.|Methods and systems for differentiating somatic and germline variants|
CN107688726B|2017-09-21|2021-09-07|深圳市易基因科技有限公司|Method for judging single-gene-disease-related copy number deficiency based on liquid phase capture technology|
US11099202B2|2017-10-20|2021-08-24|Tecan Genomics, Inc.|Reagent delivery system|
WO2019090156A1|2017-11-03|2019-05-09|Guardant Health, Inc.|Normalizing tumor mutation burden|
EP3704265A4|2017-11-03|2021-09-29|Guardant Health, Inc.|Correcting for deamination-induced sequence errors|
CN108197428B|2017-12-25|2020-06-19|西安交通大学|Copy number variation detection method for next generation sequencing technology based on parallel dynamic programming|
CN112365927A|2017-12-28|2021-02-12|安诺优达基因科技有限公司|CNV detection device|
EP3619653B1|2018-01-15|2021-05-19|Illumina Inc.|Deep learning-based variant classifier|
CN108268752B|2018-01-18|2019-02-01|东莞博奥木华基因科技有限公司|A kind of chromosome abnormality detection device|
KR102036609B1|2018-02-12|2019-10-28|바이오코아 주식회사|A method for prenatal diagnosis using digital PCR|
EP3781713A4|2018-04-16|2022-01-12|Memorial Sloan Kettering Cancer Center|Systems and methods for detecting cancer via cfdna screening|
AU2019261597A1|2018-04-24|2020-11-19|Grail, Llc|Systems and methods for using pathogen nucleic acid load to determine whether a subject has a cancer condition|
EP3802878A1|2018-06-04|2021-04-14|Guardant Health, Inc.|Methods and systems for determining the cellular origin of cell-free nucleic acids|
CN109192246B|2018-06-22|2020-10-16|深圳市达仁基因科技有限公司|Method, apparatus and storage medium for detecting chromosomal copy number abnormalities|
CN112752854A|2018-07-23|2021-05-04|夸登特健康公司|Methods and systems for modulating tumor mutational burden by tumor score and coverage|
US20210292851A1|2018-07-27|2021-09-23|Roche Sequencing Solutions, Inc.|Method of monitoring effectiveness of immunotherapy of cancer patients|
EP3841583A1|2018-08-22|2021-06-30|The Regents of the University of California|Sensitively detecting copy number variationsfrom circulating cell-free nucleic acid|
JP2021536232A|2018-08-30|2021-12-27|ガーダント ヘルス, インコーポレイテッド|Methods and systems for detecting contamination between samples|
US20210363586A1|2018-08-31|2021-11-25|Guardant Health, Inc.|Microsatellite instability detection in cell-free dna|
EP3844760A1|2018-08-31|2021-07-07|Guardant Health, Inc.|Genetic variant detection based on merged and unmerged reads|
US20200075124A1|2018-09-04|2020-03-05|Guardant Health, Inc.|Methods and systems for detecting allelic imbalance in cell-free nucleic acid samples|
CN109523520B|2018-10-25|2020-12-18|北京大学第三医院|Chromosome automatic counting method based on deep learning|
US20200131566A1|2018-10-31|2020-04-30|Guardant Health, Inc.|Methods, compositions and systems for calibrating epigenetic partitioning assays|
CN109584961A|2018-12-03|2019-04-05|元码基因科技(北京)股份有限公司|Method based on two generation sequencing technologies detection blood microsatellite instability|
US20200202975A1|2018-12-19|2020-06-25|AiOnco, Inc.|Genetic information processing system with mutation analysis mechanism and method of operation thereof|
CA3119980A1|2018-12-20|2020-06-25|Guardant Health, Inc.|Methods, compositions, and systems for improving recovery of nucleic acid molecules|
CN109712671B|2018-12-20|2020-06-26|北京优迅医学检验实验室有限公司|Gene detection device based on ctDNA, storage medium and computer system|
EP3918089A1|2019-01-31|2021-12-08|Guardant Health, Inc.|Compositions and methods for isolating cell-free dna|
CN109841265B|2019-02-22|2021-09-21|清华大学|Method and system for determining tissue source of plasma free nucleic acid molecules by using fragmentation mode and application|
KR20210132139A|2019-02-27|2021-11-03|가던트 헬쓰, 인크.|Computer Modeling of Loss of Function Based on Allele Frequency|
WO2020176659A1|2019-02-27|2020-09-03|Guardant Health, Inc.|Methods and systems for determining the cellular origin of cell-free dna|
US20210017605A1|2019-05-31|2021-01-21|Guardant Health, Inc.|Methods and systems for improving patient monitoring after surgery|
US20210115502A1|2019-09-30|2021-04-22|Guardant Health, Inc.|Compositions and methods for analyzing cell-free dna in methylation partitioning assays|
WO2021077411A1|2019-10-25|2021-04-29|苏州宏元生物科技有限公司|Chromosome instability detection method, system and test kit|
WO2021081423A1|2019-10-25|2021-04-29|Guardant Health, Inc.|Methods for 3' overhang repair|
WO2021108708A1|2019-11-26|2021-06-03|Guardant Health, Inc.|Methods, compositions and systems for improving the binding of methylated polynucleotides|
US20210398610A1|2020-01-31|2021-12-23|Guardant Health, Inc.|Significance modeling of clonal-level absence of target variants|
US11211147B2|2020-02-18|2021-12-28|Tempus Labs, Inc.|Estimation of circulating tumor fraction using off-target reads of targeted-panel sequencing|
US11211144B2|2020-02-18|2021-12-28|Tempus Labs, Inc.|Methods and systems for refining copy number variation in a liquid biopsy assay|
US20210343363A1|2020-03-11|2021-11-04|Guardant Health, Inc.|Methods for classifying genetic mutations detected in cell-free nucleic acids as tumor or non-tumor origin|
WO2021222828A1|2020-04-30|2021-11-04|Guardant Health, Inc.|Methods for sequence determination using partitioned nucleic acids|
WO2021231862A1|2020-05-14|2021-11-18|Georgia Tech Research Corporation|Methods of detecting the efficacy of anticancer agents|
US20220025468A1|2020-05-14|2022-01-27|Guardant Health, Inc.|Homologous recombination repair deficiency detection|
WO2022026761A1|2020-07-30|2022-02-03|Guardant Health, Inc.|Methods for isolating cell-free dna|
WO2022046947A1|2020-08-25|2022-03-03|Guardant Health, Inc.|Methods and systems for predicting an origin of a variant|
WO2022047213A2|2020-08-27|2022-03-03|Guardant Health, Inc.|Computational detection of copy number variation at a locus in the absence of direct measurement of the locus|
法律状态:
2019-09-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201261696734P| true| 2012-09-04|2012-09-04|
US201261704400P| true| 2012-09-21|2012-09-21|
US201361793997P| true| 2013-03-15|2013-03-15|
US201361845987P| true| 2013-07-13|2013-07-13|
PCT/US2013/058061|WO2014039556A1|2012-09-04|2013-09-04|Systems and methods to detect rare mutations and copy number variation|
[返回顶部]